***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb --------------------------------------------------............. [NO] DeepSpeed C++/CUDA extension op report....... --------------------------------------------------[OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible ninja .................. [OKAY] -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... ....... [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer ............ [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] stochastic_transformer . [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] ninja .................. [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] utils....... ..................[OKAY] [YES] ...... [OKAY] utils .................. [YES] ......quantizer [OKAY].............. [NO] ....... quantizer[OKAY] .............. [NO] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. op name ................ installed .. compatible -------------------------------------------------- async_io ............... [NO] ....... [NO] cpu_adam ............... [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.transformer_inference .. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]async_io fused_lamb ............. [NO] ....... [OKAY] ............... [NO] quantizer....... ..............[NO] [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path ...............torch version .................... 1.8.1 torch cuda version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ............... 11.1 torch versionnvcc version ......................................... 1.8.111.2 deepspeed install pathtorch cuda version .......................... 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] nvcc versiondeepspeed info ........................................ 11.2 0.4.2+bc17042, bc17042, big-sciencedeepspeed install path deepspeed wheel compiled w............ ...... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] .......utils [OKAY].................. [YES] ...... [OKAY] utils ..................quantizer [YES].............. ......[NO] [OKAY] ....... [OKAY] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.1 11.1 nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info: torch version .................... 1.8.1 torch cuda version ............... 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 torch version .................... 1.8.1 torch cuda version ............... 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch install path .................... ...............1.8.1 torch cuda version ............... 11.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version .....................torch version 11.2.................... deepspeed install path1.8.1 ........... torch cuda version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']............... deepspeed info11.1 ...................nvcc version 0.4.2+bc17042, bc17042, big-science..................... deepspeed wheel compiled w.11.2 ......deepspeed install path torch 1.8, cuda 11.1........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ............... utils[NO] ......................... [YES][NO] ...... [OKAY] quantizer .............. [NO] .......transformer_inference [OKAY].. [NO] .......-------------------------------------------------- [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- DeepSpeed general environment info: op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] fused_adam ............. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science fused_lamb ............. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 stochastic_transformer . [NO] ....... [OKAY] torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... torch version1.8.1 .................... torch cuda version1.8.1 ............... 11.1torch cuda version nvcc version............... .....................11.1 11.2nvcc version deepspeed install path..................... ...........11.2 deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...........deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-sciencedeepspeed info deepspeed wheel compiled w.................... ......0.4.2+bc17042, bc17042, big-science torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'].................... 1.8.1 torch version ....................torch cuda version 1.8.1............... 11.1torch cuda version nvcc version............... .....................11.1 11.2nvcc version deepspeed install path..................... ...........11.2 deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed info ...................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... torch 1.8, cuda 11.1deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch install path ...............torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch cuda version ...............torch version 11.1.................... nvcc version1.8.1 ..................... 11.2torch cuda version ...............deepspeed install path ...........11.1 nvcc version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ..................... deepspeed info11.2 ...................deepspeed install path 0.4.2+bc17042, bc17042, big-science........... deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...... deepspeed infotorch 1.8, cuda 11.1 ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1DeepSpeed general environment info: torch cuda versiontorch cuda version .............................. 11.111.1 torch install pathnvcc versionnvcc version ......................................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infotorch version deepspeed info .................... ................... ................... 1.8.1 0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.torch cuda versiondeepspeed wheel compiled w. ........................... torch 1.8, cuda 11.111.1torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 /bin/sh: line 0: type: git: not found nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attnninja ............ [NO].................. .......[OKAY] [OKAY] -------------------------------------------------- op nametransformer ............................ installed[NO] ......... compatible [OKAY] -------------------------------------------------- stochastic_transformer . [NO] cpu_adam....... ...............[OKAY] [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] --------------------------------------------------....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. *****************************************  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] transformer_inference....... ..[NO] [NO] ....... [OKAY] utils ..................transformer_inference [YES].. ......[NO] [OKAY]....... [OKAY] quantizer .............. [NO] .......utils [OKAY].................. [YES] ...... --------------------------------------------------[OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info:torch version .................... 1.8.1 torch cuda version ...............torch install path 11.1............... nvcc version ..................... 11.2 deepspeed install path['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ........... torch version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ....................deepspeed info 1.8.1................... 0.4.2+bc17042, bc17042, big-science torch cuda version deepspeed wheel compiled w................ ......11.1 torch 1.8, cuda 11.1nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninjaJIT compiled ops requires ninja runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja .................................... ..................[OKAY] .................. ninjaninjaninja ninja.................................... .................. [OKAY][OKAY] .................. [OKAY][OKAY] -------------------------------------------------- [OKAY][OKAY]-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- op name op name-------------------------------------------------- op name op nameop nameop name................ ................................................installed installedinstalled..installed ....compatible.. compatiblecompatiblecompatible-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ................................ ................op nameinstalledinstalled ..installed.. ................ compatible..compatible installed--------------------------------------------------compatible-------------------------------------------------- cpu_adam ............... cpu_adamcpu_adam [YES]cpu_adam ............... ............... ..................... [YES] [YES] [YES] [OKAY]...... ...... ...... [OKAY] [OKAY] [OKAY] op nameop name op nameop name................................ ................installedinstalled................ installed..installed.. ....compatiblecompatible compatiblecompatible-------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- ..-------------------------------------------------- compatible -------------------------------------------------- fused_adam .............fused_adam fused_adam [NO] ............. fused_adam............. ....... [NO] [NO].............[OKAY] ..............[NO] [OKAY][OKAY].......fused_lamb cpu_adamcpu_adam cpu_adamcpu_adam ............... ............... ..............................[YES] [YES] [YES] [YES]...... ..................[OKAY] cpu_adamcpu_adam ...............cpu_adam............... [YES]cpu_adam............... [YES] [YES]............ ............... ...... [OKAY][OKAY] [YES][OKAY] [OKAY]............. [OKAY][OKAY][OKAY] ...... [OKAY] fused_lambfused_lamb[NO] ................................. fused_lamb [NO][NO] [OKAY]........................... [NO][OKAY][OKAY] fused_adamfused_adam fused_adamfused_adam ............. ............. .......................... [NO] [NO] [NO] ....... [NO]....... ....... [OKAY] ....... [OKAY] [OKAY][OKAY]....... fused_lamb[OKAY] fused_lamb fused_adamfused_adam fused_adam.......................... fused_adam[NO] [NO]............. .................... ....... [NO][OKAY] [NO] sparse_attn ............sparse_attn sparse_attn [NO] ........................sparse_attn....... ............[NO][NO][OKAY] [NO].............. .......[OKAY][OKAY] transformer[OKAY] .............fused_lamb [NO]..........................fused_lamb ....... [NO][NO] ............. ....... [NO][OKAY] ....... [OKAY][OKAY]....... .......[OKAY] fused_lamb [OKAY]....... fused_lamb............. [OKAY]............. ............transformer transformer............transformer[NO] ............[NO]................... [NO].......[NO][OKAY] [OKAY] fused_lamb[NO] .......[NO]............. [OKAY][NO] .......[OKAY]....... [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY]sparse_attn fused_lamb....... .................... [OKAY][OKAY] stochastic_transformer stochastic_transformer. stochastic_transformer stochastic_transformer [NO]. ........[NO] . [NO] [OKAY][NO]....... ....... .......[OKAY] [OKAY][OKAY] sparse_attn............sparse_attn transformer ............[NO]........................ .......[NO][NO] [NO] [OKAY]..................... [NO] .......sparse_attn [OKAY]............ [NO] .......sparse_attn [OKAY] [OKAY][OKAY]transformer[OKAY] sparse_attn............ transformer ............ [NO]............ [NO]sparse_attn .......[NO] ....... ............ [OKAY] ....... [OKAY][NO] ............ transformer[NO] stochastic_transformertransformer................... [OKAY][NO]............. [OKAY]transformer ....... ............transformer [OKAY][NO]............ stochastic_transformer ....... [NO] [NO] [OKAY]stochastic_transformer ....... ....... [OKAY][OKAY]. ....... [NO] [OKAY] transformer stochastic_transformer [NO] .stochastic_transformer....... [NO][OKAY] . ....... [NO][OKAY] . ....... ............[NO] stochastic_transformer [OKAY][NO] ....... .[OKAY]....... ....... [OKAY] [NO] stochastic_transformer [OKAY] ....... .[OKAY] [NO]stochastic_transformer ....... [OKAY]. [NO] ....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop name op name................op name ................................installed................ ..installed installedinstalledcompatible .... ..-------------------------------------------------- compatible compatiblecompatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... cpu_adam...............cpu_adam[YES] [YES] ............... ............... ...... [YES][OKAY]...... [YES]......[OKAY] ...... [OKAY][OKAY] fused_adam ............. [NO] fused_adam....... .............[OKAY]fused_adam fused_adam [NO] ............. fused_lamb ....... ............. .............[NO] [OKAY] [NO] [NO]....... fused_lamb..............[OKAY] .............[OKAY] [OKAY] [NO] fused_lamb....... .............fused_lamb [OKAY] [NO] .................... sparse_attn [NO] [OKAY] ................... [NO][OKAY] .......sparse_attn [OKAY]............ [NO] ....... sparse_attntransformer[OKAY] ........................transformer sparse_attn [NO] [NO]............................... [NO].......[OKAY] [NO] .......[OKAY] [OKAY].......stochastic_transformer transformer [OKAY]............ .stochastic_transformer [NO]transformer.[NO] ..........................[NO] [OKAY][OKAY][NO] ....... [OKAY]....... stochastic_transformer[OKAY] . stochastic_transformer[NO] ....... .[OKAY] [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop name op name ................ ................ ................................ installed installed installed ..installed .. .. ..compatible compatible compatible compatible------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- cpu_adamcpu_adam cpu_adam ...............cpu_adam ..............................[YES] ............... [YES] ...... [YES][YES] ...... [OKAY]......[OKAY] ...... [OKAY][OKAY] fused_adam fused_adam............. fused_adam[NO]fused_adam............. ....... .......................... [NO] [OKAY] [NO][NO] ....... .......fused_lamb[OKAY] ....... ............. [OKAY] [OKAY] [NO] fused_lamb fused_lamb....... fused_lamb ............. .............[OKAY]............. [NO][NO][NO] ..................... [OKAY][OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attntransformer sparse_attn ............sparse_attn ............ ............ [NO] [NO]............ .......[NO] .......[NO] ....... [OKAY][OKAY] ....... [OKAY] [OKAY]stochastic_transformer transformer.transformertransformer [NO]............ ............ [NO]............ ....... [NO] [NO]....... [OKAY] ....... ....... [OKAY] [OKAY] [OKAY] stochastic_transformerstochastic_transformerstochastic_transformer ... [NO][NO][NO] ..................... [OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaJIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... ..................[OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name--------------------------------------------------op name ................op name................op name ................installed................installed ..installedinstalled.. compatiblecompatible.. .. -------------------------------------------------- --------------------------------------------------compatible compatible ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... [YES]............... cpu_adamcpu_adam [YES]..................... [OKAY] ..................... [OKAY] [YES][YES] ............ [OKAY][OKAY] fused_adam fused_adam............. .............[NO] [NO]....... fused_adamfused_adam....... [OKAY] ............. .............[OKAY] [NO] fused_lamb [NO] fused_lamb.................... ....................[NO][OKAY] ....... [NO][OKAY][OKAY] fused_lamb ....... fused_lamb .............[OKAY] .............[NO] [NO]....... .......[OKAY] [OKAY]sparse_attn ............ [NO] ....... sparse_attn[OKAY] ............ [NO]transformer ................... sparse_attnsparse_attn[OKAY] [NO] ............transformer ............ .......[NO] ............ [OKAY][NO] ....... [NO] [OKAY] .......stochastic_transformer ....... [OKAY] transformer.[OKAY] ............[NO]transformer stochastic_transformer[NO] ....... ............ ........ [OKAY] [NO][OKAY] [NO] .............. stochastic_transformer[OKAY][OKAY] .stochastic_transformer [NO] ........ [NO][OKAY] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninja .................. .................. ninja ..................[OKAY][OKAY] [OKAY]..................---------------------------------------------------------------------------------------------------- [OKAY] -------------------------------------------------- op nameop name -------------------------------------------------- op name................ ................ op nameinstalled................installed ................installed.. .. installed compatiblecompatible.. .. --------------------------------------------------compatible--------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam ..............................cpu_adamcpu_adam [YES]...............[YES] ......[YES]............... ...... [OKAY] ...... [YES] [OKAY] [OKAY] ...... [OKAY] fused_adam ............. fused_adamfused_adam[NO] fused_adam....... ............. ............. [OKAY] .............[NO][NO] .......[NO].......fused_lamb [OKAY].................... [OKAY] [NO] [OKAY].......fused_lamb [OKAY].............fused_lamb fused_lamb [NO].......................... .......[NO][NO] [OKAY].............. sparse_attn [OKAY][OKAY]............ [NO] ....... [OKAY] sparse_attntransformer ........................ [NO]sparse_attn[NO] sparse_attn ................... ................... [OKAY] [NO][NO] [OKAY] .............. stochastic_transformer [OKAY] transformer[OKAY] ............. [NO]transformer[NO] ....... transformer............ ....... [OKAY][OKAY]............ [NO] [NO]....... stochastic_transformer.......[OKAY] [OKAY]. stochastic_transformer[NO] stochastic_transformer........ [OKAY][NO] . .......[NO] [OKAY]....... [OKAY] ninjaninjaninja ninja .................................... .................. ..................[OKAY] [OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name ................ ................................installed ................installed..installed installed ..compatible compatible.... -------------------------------------------------- --------------------------------------------------compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... cpu_adam[YES] ...............cpu_adam......cpu_adam ...............[YES][OKAY]............... ...... [YES] [YES][OKAY] ............ [OKAY][OKAY] fused_adam ............. [NO] ....... [OKAY]fused_adam fused_adam.............fused_adam fused_lamb ............. [NO] ............. .............[NO] ....... [NO] [NO] .......[OKAY].............. [OKAY][OKAY][OKAY] fused_lamb fused_lamb.............fused_lamb ............. [NO]............. [NO].......[NO] sparse_attn[OKAY]....... ....... ............ [OKAY] [OKAY] [NO] ....... [OKAY] transformer ............ sparse_attn[NO] ................... sparse_attn[OKAY] [NO]sparse_attn ............................... stochastic_transformer[OKAY] [NO][NO] ...............transformer [NO][OKAY]............ [OKAY]....... [NO][OKAY]transformer transformer ................... ............[OKAY][NO] [NO]....... .......[OKAY] [OKAY]stochastic_transformer .stochastic_transformerstochastic_transformer [NO] ......... [OKAY] [NO] [NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja .................. ..................[OKAY].................. .................. [OKAY]-------------------------------------------------- [OKAY] [OKAY] --------------------------------------------------op name -------------------------------------------------- -------------------------------------------------- op name................ op name................installed op name ................ ..installed ................installed compatible ..installed.. -------------------------------------------------- compatible..compatible compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... cpu_adam[OKAY]cpu_adam cpu_adam.............................. ...............[YES][YES] ......[YES]...... [OKAY]......[OKAY] fused_adam [OKAY]............. [NO] ....... [OKAY] fused_adamfused_adamfused_lamb .......................................fused_adam [NO][NO].............[NO] ..................... [NO][OKAY] [OKAY][OKAY] ....... fused_lamb[OKAY] fused_lamb ............. .............[NO] fused_lamb [NO] ....... sparse_attn .................... [OKAY] ............[NO] [OKAY][NO]....... .......[OKAY] [OKAY] transformer ............sparse_attn sparse_attn[NO]............ .......sparse_attn............[NO] [OKAY] .......[NO]............ [OKAY][NO]....... stochastic_transformer.......[OKAY] transformer[OKAY] transformer............. ............[NO][NO]transformer [NO].......................... [OKAY][OKAY]....... [NO] [OKAY]....... stochastic_transformer[OKAY] .stochastic_transformer [NO]stochastic_transformer ........ .[NO][OKAY] [NO]....... .......[OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] utils .................. async_io[YES] ..................... [NO][OKAY] ....... [NO] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] async_io...... ...............[OKAY] [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer .............. [NO] quantizer....... ..............[OKAY] [NO] ....... --------------------------------------------------[OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference ..utils [NO].................. .......[YES] [OKAY]...... [OKAY] utils quantizer.................. ..............[YES] [NO]...... .......[OKAY] [OKAY] quantizer .............. --------------------------------------------------[NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer ..............quantizer ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils ..................utils [YES].................. ...... [YES][OKAY] ...... [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference utils.. ..................[NO] [YES]....... ......[OKAY] [OKAY] quantizer ..............utils [NO].................. .......[YES] [OKAY]...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja--------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... ...............[NO] [NO] ....... [NO] transformer_inference .. [NO] ....... transformer_inference[OKAY] .. [NO] ....... [OKAY] utils .................. [YES] ...... utils[OKAY] .................. [YES] ...... quantizer[OKAY] .............. [NO] .......quantizer [OKAY].............. [NO] ....... [OKAY]-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils .................. utils[YES] .................. ......[YES] [OKAY]...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- ninjaninjaninja ....................................ninja ..................[OKAY] [OKAY][OKAY]--------------------------------------------------.................. [OKAY]--------------------------------------------------op name -------------------------------------------------- --------------------------------------------------op name ................ op nameinstalled................op name .................. installed................ installedcompatible .. installed .. --------------------------------------------------compatible .. compatible --------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... cpu_adamcpu_adam[OKAY]cpu_adam ............................................. [YES][YES][YES] fused_adam...... ...... ................... [OKAY] [OKAY][NO][OKAY] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_lambfused_adam ............. [NO]............. fused_adam fused_adam....... [NO] ............. [OKAY].................... async_ioasync_io .............................. [NO][NO] .............. [NO][NO] [NO][NO][OKAY] .............. [OKAY][OKAY]fused_lamb transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] .............sparse_attn [NO]fused_lamb fused_lamb................... ............. .............[NO] [OKAY] [NO]....... ....... [OKAY] [NO] ....... [OKAY] .......[OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] transformer[OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] ............ [NO] sparse_attn....... [OKAY]............ -------------------------------------------------- -------------------------------------------------- [NO]sparse_attn ....... stochastic_transformersparse_attn ............ ............[OKAY] . [NO][NO][NO]transformer ....... ................... .......[NO] ....... [OKAY][OKAY] [OKAY] [OKAY] transformertransformer ........................ stochastic_transformer [NO] [NO] ............... [OKAY][NO][OKAY] ....... [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`........ [NO] transformer_inference .. [NO] ....... [OKAY]async_io ............... [NO] ....... utils[NO] .................. [YES] ...... [OKAY] quantizer transformer_inference.............. ..[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizerquantizer ............................ [NO] [NO]....... .......[OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version .................... torch version1.8.1 .................... torch cuda version1.8.1 ............... torch cuda version11.1 ...............nvcc version 11.1..................... nvcc version11.2 .....................deepspeed install path 11.2........... deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed info ...................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... torch 1.8, cuda 11.1deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1 torch cuda version ...............torch cuda version 11.1............... nvcc version11.1 ..................... nvcc version11.2 .....................deepspeed install path 11.2........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']................... deepspeed info0.4.2+bc17042, bc17042, big-science ...................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version .................... torch version1.8.1 .................... torch cuda version1.8.1 ............... torch cuda version11.1 ...............nvcc version 11.1..................... nvcc version11.2 .....................deepspeed install path 11.2........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ................... deepspeed info0.4.2+bc17042, bc17042, big-science ................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninja ninja.................................... [OKAY]....................................[OKAY] --------------------------------------------------[OKAY]-------------------------------------------------- [OKAY] op name --------------------------------------------------op name -------------------------------------------------- ................ ................op nameinstalled op name ..installed ................ ................compatible .. /bin/sh: line 0: type: git: not found installed installed-------------------------------------------------- .. compatiblecompatible ..---------------------------------------------------------------------------------------------------- compatible -------------------------------------------------- cpu_adam ............... cpu_adam[YES] cpu_adam......cpu_adam ............... [OKAY]...............[YES]............... [YES][YES]...... ...... ...... [OKAY] [OKAY] fused_adam[OKAY] ............. [NO] ....... [OKAY] fused_adamfused_lamb fused_adam..........................fused_adam .............[NO].............[NO] ....... [NO].......[NO] [OKAY].............. [OKAY] [OKAY] [OKAY]fused_lamb /bin/sh: line 0: type: git: not found fused_lamb............. fused_lamb.............[NO] sparse_attn.............[NO]....... ............[NO].......[OKAY] [NO] .......[OKAY] ....... [OKAY][OKAY] transformer ............ sparse_attn[NO] .......sparse_attn............ [OKAY][NO]sparse_attn............ [NO] ....... ............stochastic_transformer.......[OKAY] [NO] [OKAY] . **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer.......[NO]transformer [OKAY]............................... [OKAY][NO][NO] transformer .......................... [OKAY][NO][OKAY] .......stochastic_transformer stochastic_transformer [OKAY] . .[NO] stochastic_transformer[NO]....... ....... [OKAY] [OKAY]. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... [NO]............... [NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... [OKAY]quantizer .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY]-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1 torch cuda versiontorch version ................................... 11.11.8.1 nvcc version .....................torch cuda version 11.2............... deepspeed install path11.1 ...........nvcc version .....................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 11.2 deepspeed infodeepspeed install path .............................. 0.4.2+bc17042, bc17042, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. ......deepspeed info torch 1.8, cuda 11.1................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ...............DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch install path ............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ............... torch version11.1 ....................nvcc version 1.8.1..................... 11.2 torch cuda versiondeepspeed install path .......................... 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']nvcc version .....................deepspeed info 11.2................... deepspeed install path0.4.2+bc17042, bc17042, big-science ........... deepspeed wheel compiled w. ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...... torch 1.8, cuda 11.1deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] async_io-------------------------------------------------- ............... [NO] ....... [NO] ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] --------------------------------------------------op name-------------------------------------------------- op name ................ op name ................op name installed installed ................ .................. ..installed installed compatible....compatible utils .................. [YES] ...... [OKAY] ----------------------------------------------------------------------------------------------------compatiblecompatible ---------------------------------------------------------------------------------------------------- quantizer .............. [NO] ....... [OKAY] cpu_adamcpu_adam ..............................cpu_adamcpu_adam [YES][YES].............................. ............[YES][YES] [OKAY][OKAY]............ -------------------------------------------------- [OKAY][OKAY] fused_adamfused_adam .......................... fused_adam[NO]fused_adam [NO] ........................................ [NO][OKAY][OKAY] [NO] ....... .......fused_lambfused_lamb[OKAY] [OKAY]..........................fused_lamb [NO][NO]fused_lamb............. ........................... [NO] [OKAY] [OKAY][NO]....... .......[OKAY] [OKAY] sparse_attnsparse_attn ............sparse_attn............sparse_attn [NO][NO] ............ .......................... [NO] [NO][OKAY] [OKAY] ....... ....... transformer [OKAY] [OKAY]transformer ............ ............transformer[NO] [NO]transformer................... ....... [NO][OKAY] ............ [OKAY] .......[NO] stochastic_transformer [OKAY]stochastic_transformer ....... .. [OKAY] stochastic_transformer[NO] [NO] .............. stochastic_transformer [OKAY]. [OKAY] .[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] ....... [OKAY]utils .................. [YES] ...... [OKAY] utils ..................quantizer [YES] .................... [NO] ....... [OKAY][OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name ................................ ................ ................ installed installedinstalled.. installed .. compatible .. ..compatible -------------------------------------------------- compatible--------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES]cpu_adam ......cpu_adam cpu_adam...............[OKAY] ...............[YES]............... [YES]......[YES] ............ [OKAY][OKAY] [OKAY]fused_adam ............. [NO] ....... [OKAY] fused_adamfused_adam ..........................fused_lamb [NO][NO]fused_adam............. .......[NO] .................... [OKAY] .......[NO] [OKAY] [OKAY] fused_lamb....... [OKAY]............. fused_lamb [NO]............. fused_lamb ....... [NO] ............. [OKAY] .......[NO] sparse_attn [OKAY]............ .......[NO] [OKAY]....... [OKAY] sparse_attn transformer............ ............ sparse_attn [NO] [NO] ............ ....... ....... [NO][OKAY][OKAY]sparse_attn ....... transformerstochastic_transformer............ [OKAY]............ [NO] .[NO].......transformer [NO] [OKAY]....... ............ ....... [NO][OKAY][OKAY] transformer....... stochastic_transformer ............ [OKAY] [NO]. [NO] ....... .......stochastic_transformer[OKAY] [OKAY] . [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... ..................[OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op name op name op name................ ................ ................ ................installed installedinstalled installed .. .... .. compatiblecompatible compatible compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... cpu_adam ............... [YES] ............... [YES] ...... [YES] ...... [OKAY] ...... cpu_adam [OKAY] [OKAY] ............... [YES]fused_adam ................... fused_adam fused_adam[OKAY][NO] .......................... ....... [NO] [NO] [OKAY] ....... ....... [OKAY][OKAY] fused_lamb .............fused_lamb [NO]fused_lamb............. .......[NO] ............. [OKAY] ....... [NO]fused_adam [OKAY].................... [OKAY][NO] ....... sparse_attn[OKAY] ............sparse_attn sparse_attn[NO]............ ...................[NO] [NO] fused_lamb[OKAY] ....... ....... [OKAY][OKAY] .............transformer transformertransformer [NO] .................................... [NO][NO][NO] ............................ [OKAY][OKAY][OKAY][OKAY] stochastic_transformer stochastic_transformerstochastic_transformer. [NO].. .......[NO][NO] [OKAY].............. [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch version ....................torch cuda version 1.8.1............... 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed info ................... 0.4.2+bc17042, bc17042, big-science........... deepspeed wheel compiled w. ...... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info:torch version .................... 1.8.1 torch install pathtorch cuda version .............................. 11.1 nvcc version ..................... 11.2 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed install path ........... torch version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'].................... deepspeed info1.8.1 ................... 0.4.2+bc17042, bc17042, big-sciencetorch cuda version deepspeed wheel compiled w................ ......11.1 torch 1.8, cuda 11.1nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] utils....... ..................[OKAY] [YES] ...... [OKAY] utils ..................quantizer [YES].............. ......[NO] [OKAY]....... [OKAY] quantizer-------------------------------------------------- .............. [NO] ....... [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... async_io[NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference ..utils [NO].................. .......[YES] [OKAY]...... [OKAY] utilsquantizer ................................ [NO][YES] ............. [OKAY][OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ninjaninjaninjaninja .................. .................. .................. [OKAY].................. [OKAY] [OKAY]-------------------------------------------------- [OKAY] --------------------------------------------------op name-------------------------------------------------- -------------------------------------------------- op name................ op name ................op name installed ................ installed ................installed.... installed..compatiblecompatible compatible..-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- compatible -------------------------------------------------- cpu_adamcpu_adamcpu_adam cpu_adam.............................. ............... ...............[YES][YES] [YES]......[YES]...... ......[OKAY]......[OKAY] [OKAY] [OKAY] fused_adam fused_adamfused_adamfused_adam............. [NO].......................... ............. .......[NO] [NO] [OKAY][NO].............. ....... [OKAY] fused_lamb[OKAY] [OKAY] .............fused_lamb [NO]fused_lamb fused_lamb ............. ....... .......................... [NO] [NO].......[OKAY][NO] ....... [OKAY] ....... [OKAY] [OKAY] sparse_attnsparse_attn sparse_attn ........................ sparse_attn[NO]............ [NO] ............[NO] .............. .......[NO][OKAY][OKAY] [OKAY]....... transformer [OKAY]............transformer [NO] transformer ....... transformer........................ [OKAY] ............ [NO] [NO] stochastic_transformer[NO] ....... ....... .[OKAY][OKAY]....... [NO][OKAY]stochastic_transformer ....... stochastic_transformer .[OKAY] stochastic_transformer[NO]. .......[NO] . [OKAY] .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................. .................................... ..................[OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name--------------------------------------------------op name op name ................op name ................ ................installed................ installed..installed compatible ..installed.. -------------------------------------------------- compatible .. compatible --------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... ...............[OKAY] cpu_adam [YES]cpu_adam .................................... [YES][YES][OKAY] fused_adam ...... ...... ............. [OKAY] [OKAY] [NO] ....... fused_adam[OKAY] ............. [NO] fused_lamb....... fused_adam[OKAY]fused_adam............. ..........................[NO]fused_lamb [NO].......[NO]............. .......[NO][OKAY]....... [OKAY][OKAY] ....... [OKAY] fused_lambfused_lamb .......................... sparse_attn[NO][NO] .......................... [NO][OKAY][OKAY] sparse_attn ....... ............[OKAY] [NO] ....... transformer[OKAY] ............ [NO]transformersparse_attn ...................sparse_attn ........................[OKAY] [NO] [NO][NO]....... [OKAY]stochastic_transformer ....... ....... . stochastic_transformer[NO][OKAY] [OKAY] ....... . transformer [OKAY] transformer[NO] ............ .......[NO]............ [OKAY]....... [NO][OKAY] ....... [OKAY] stochastic_transformer stochastic_transformer . .[NO] [NO]....... .......[OKAY] [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY] [OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name op nameop name................ ................................................installed installedinstalledinstalled .... compatible.. .. compatible compatiblecompatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam...............cpu_adam ............... [YES]............... ............... [YES][YES]...... [YES] ......[OKAY]...... ...... [OKAY] [OKAY] [OKAY] fused_adam fused_adam............. fused_adam.............[NO] fused_adam ............. [NO]....... ............. .......[OKAY][NO] [NO][OKAY]....... fused_lamb[OKAY]....... .............fused_lamb[OKAY] fused_lamb[NO] ............. ....................[NO] fused_lamb[NO].......[OKAY] ....... .............[OKAY] [OKAY][NO] ....... [OKAY] sparse_attn ............ [NO]sparse_attnsparse_attn ............................... [OKAY]sparse_attn[NO][NO] .......................... transformer [NO] [OKAY] [OKAY] ................... [OKAY][NO] transformer.......transformertransformer [OKAY].................................... [NO][NO] [NO] .............. stochastic_transformer....... [OKAY] [OKAY] [OKAY]. [NO] stochastic_transformer....... stochastic_transformer stochastic_transformer [OKAY] . ..[NO] [NO][NO]....... ..............[OKAY] [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1 torch cuda version ...............torch cuda version 11.1............... 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.1 nvcc version ..................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed info........... ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-sciencedeepspeed info deepspeed wheel compiled w.................... ......0.4.2+bc17042, bc17042, big-science torch 1.8, cuda 11.1deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] transformer_inference .. [NO]transformer_inference ....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] ...... utils[OKAY] .................. [YES]quantizer .................... [OKAY][NO] ....... [OKAY] quantizer .............. --------------------------------------------------[NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils ..................utils [YES] ........................ [YES][OKAY] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY]-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... torch install path ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version1.8.1 ....................torch version torch cuda version 1.8.1 .................... ............... 1.8.1torch cuda version11.1 ...............nvcc versiontorch cuda version 11.1 ............... ..................... nvcc version 11.1 11.2 ..................... nvcc version deepspeed install path 11.2 ..................... ........... deepspeed install path 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...........deepspeed install path deepspeed info........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info0.4.2+bc17042, bc17042, big-science deepspeed info...................deepspeed wheel compiled w. .........................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.1 1.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY] [OKAY][OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op nameop name op nameop name ................................ ................................ installed installed installedinstalled .. .. ..compatible.. ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] --------------------------------------------------compatiblecompatiblecompatible -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op nameop name op nameop name ................................ ................installed................installed installed.. installed.... compatiblecompatible ..compatible -------------------------------------------------- -------------------------------------------------- --------------------------------------------------compatible -------------------------------------------------- cpu_adam ............... [YES]cpu_adam cpu_adam...... cpu_adam[OKAY] ............... cpu_adam ............... [YES]cpu_adam cpu_adamcpu_adam...... ...............[OKAY]............... ............... ............... ............... [YES] [YES] [YES] ...... ............ [OKAY]fused_adam[OKAY][OKAY] [YES][YES][YES] ............ ......[OKAY] [OKAY][OKAY]fused_adam ............. [NO] ....... [OKAY] ............. [NO] ....... [OKAY]fused_adam fused_adamfused_lamb fused_adam............. fused_adam ............. .............[NO].............[NO] .......[NO][NO]....... [OKAY] .......[OKAY] .......[OKAY]fused_lamb [OKAY]............. fused_adam.............fused_adam ..........................fused_lamb[NO] [NO].......[NO]............. .......[OKAY]....... [NO] [OKAY][OKAY]fused_lamb....... fused_lamb[NO] ....................fused_lamb sparse_attn [OKAY][NO]............. .............fused_lamb [OKAY]fused_lamb[NO] ............ .......[NO][NO] [OKAY].............. .......................... ....... [NO] [NO] [OKAY] ....... [OKAY][OKAY] sparse_attntransformer ........................ [NO][NO] .............. [OKAY]sparse_attn[OKAY] .......[OKAY] [OKAY]sparse_attn ............ [NO] ....... [OKAY] transformersparse_attn............ stochastic_transformer............ ............ [NO].[NO] [NO] [NO] ............................ [OKAY][OKAY][OKAY][OKAY] sparse_attn transformersparse_attn ............ ............ sparse_attn............ [NO][NO] ............ [NO]....... ....... [NO][OKAY].......[OKAY] [OKAY]....... transformer [OKAY]stochastic_transformer transformerstochastic_transformer transformer............ . ............ [NO] [NO] [NO] .............. .......[OKAY][OKAY] [OKAY] ............transformer [NO].transformer ............ [NO] ....... ............ ....... [NO] [OKAY] [OKAY][NO] ....... stochastic_transformer .stochastic_transformer [NO] ........ [OKAY][NO] ....... [OKAY] stochastic_transformer.......[OKAY] [OKAY]. stochastic_transformer[NO] stochastic_transformer....... . [OKAY] [NO]. .......[NO] [OKAY]....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install pathDeepSpeed general environment info: ............... torch install path['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ...............torch version 11.1.................... 1.8.1nvcc version ..................... torch cuda version11.2 ...............deepspeed install path 11.1........... nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']..................... 11.2deepspeed info deepspeed install path................... ...........0.4.2+bc17042, bc17042, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. ......deepspeed info torch 1.8, cuda 11.1................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninja ninja...................................................... .................. [OKAY][OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name................op nameop name ................................................installed installed..installedinstalled ..compatible.... compatible--------------------------------------------------compatible compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... cpu_adam...............[OKAY] cpu_adam [YES]............... .....................[YES] [OKAY]......[YES] [OKAY]fused_adam...... ............. [OKAY][NO]fused_adam .................... fused_adam [OKAY] [NO] ............. .......[NO] [OKAY]fused_lamb....... fused_adam .............fused_lamb [OKAY] .......................... [NO] [NO]fused_lamb.......[NO] ..............[OKAY]............. [OKAY][NO] [OKAY]....... [OKAY] fused_lamb .............sparse_attn [NO]............ sparse_attn[NO]....... ...................sparse_attn [OKAY][NO]............[OKAY] [NO]....... transformer.......[OKAY] ............[OKAY] [NO]transformer transformer...................sparse_attn ............[OKAY][NO] ............[NO] ....... [NO].......stochastic_transformer[OKAY] [OKAY]....... . stochastic_transformer [OKAY]stochastic_transformer [NO] ......... [NO][NO][OKAY] transformer....... ............ ....... [OKAY] [NO] [OKAY] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op nameop name ................op name................................ installedinstalled................installed ......installed compatible compatiblecompatible.. ------------------------------------------------------------------------------------------------------------------------------------------------------compatible -------------------------------------------------- cpu_adam cpu_adam...............cpu_adam cpu_adam[YES] ............... .....................[YES]............... [OKAY][YES]......[YES] [OKAY]............ [OKAY] [OKAY] fused_adam ............. [NO]fused_adam .................... [OKAY]fused_adamfused_adam[NO] .................... .............fused_lamb[OKAY][NO] [NO] ............. ....... fused_lamb....... [NO] [OKAY] .............[OKAY]....... [NO][OKAY] .......fused_lamb fused_lamb [OKAY] ............. ............. [NO][NO] .............. [OKAY]sparse_attn[OKAY] ............ [NO] .......sparse_attn [OKAY]............ [NO] transformer....... ............[OKAY] sparse_attnsparse_attn [NO] ............................... transformer [NO] [NO][OKAY] ............ ....... ....... [NO] [OKAY][OKAY]....... stochastic_transformer [OKAY]transformer . transformer ............[NO] stochastic_transformer [NO]............ ....... . .......[NO] [OKAY] [NO] [OKAY] ....... ....... [OKAY][OKAY]stochastic_transformer . [NO]stochastic_transformer ....... .[OKAY] [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name op name op name................ ................ ................ ................installedinstalled installed installed .. ....compatible ..compatible compatible -------------------------------------------------- --------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam...............cpu_adam cpu_adam...............[YES] ...............[YES]...... ............... ......[YES] [OKAY] [YES] [OKAY]...... ...... [OKAY][OKAY] fused_adam ............. fused_adam[NO] fused_adamfused_adam .................... .............[OKAY][NO]............. [NO].......[NO] fused_lamb ....... [OKAY]....... ............. ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY] [OKAY]fused_lamb [NO] .................... fused_lambfused_lamb [NO][OKAY]............. [OKAY][OKAY][OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name-------------------------------------------------- ....................[NO] [OKAY] [NO] ....... .......[OKAY] op name [OKAY] ................op nameop name................ installed................installed................ installed.... installed ..compatiblecompatible .. compatible ----------------------------------------------------------------------------------------------------compatible sparse_attn ............ sparse_attnsparse_attn[NO]sparse_attn ............ ................... ............ [NO][NO] [OKAY] -------------------------------------------------- -------------------------------------------------- [NO].............. .......transformer[OKAY] [OKAY] ............[OKAY] transformer cpu_adam ...............cpu_adam cpu_adam [YES]cpu_adam ............... .....................[YES]............... [YES] [OKAY][YES] ...... transformer[NO] ...............................transformer [NO][NO]............[OKAY] ...... [OKAY]......[OKAY] [OKAY] ..............[NO] [OKAY][OKAY]....... stochastic_transformer [OKAY] fused_adam ............. [NO] .......fused_adam fused_adam fused_adam[OKAY] ............. .stochastic_transformer stochastic_transformer [NO] stochastic_transformer . ....... . [NO] . [OKAY][NO]....... [NO][OKAY]....... ............. [NO].............fused_lamb[NO] .......[NO].................... .......[OKAY][OKAY][NO] [OKAY]....... fused_lamb[OKAY] fused_lambfused_lamb............. .......[OKAY] [OKAY] [NO].......................... .......[NO][NO] [OKAY]....... .......sparse_attn[OKAY] [OKAY]............ [NO] ....... [OKAY] transformer ............ [NO]sparse_attn sparse_attn ............................... sparse_attn [OKAY][NO] [NO] ............ ....... ....... stochastic_transformer[NO] [OKAY][OKAY] . ....... transformer[OKAY][NO]transformer ...................transformer............ [OKAY][NO] [NO]............ ..............[NO] [OKAY] [OKAY] ....... [OKAY] stochastic_transformer stochastic_transformer . stochastic_transformer.[NO] [NO]........ [OKAY].......[NO] [OKAY]....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] ---------------------------------------------------------------------------------------------------- ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name op name op name ................ op name................ ................ installed installed installed .................. .. ..compatible installed compatible compatible--------------------------------------------------.. -------------------------------------------------- -------------------------------------------------- compatible -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam[YES]............... ...............cpu_adam......[YES] [YES]............... [OKAY] ...... ......[YES] [OKAY] [OKAY] ...... [OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY]fused_adam fused_adam .......................... fused_lamb .............[NO][NO] [NO] ............. .....................[NO] [OKAY][OKAY][OKAY]....... [OKAY]fused_lamb fused_lamb fused_lamb ............. ............. ............. [NO] [NO][NO]....... .......sparse_attn.......[OKAY] [OKAY]............[OKAY] [NO] ....... [OKAY] transformer ............ [NO] ....... sparse_attn[OKAY]sparse_attn sparse_attn .................................... stochastic_transformer [NO] [NO] [NO]....... ........ ....... [OKAY][NO] [OKAY] [OKAY] .......transformer transformer............transformer [OKAY] ............ [NO] ............ [NO] ....... [NO] ....... [OKAY] ....... [OKAY] [OKAY] stochastic_transformerstochastic_transformer stochastic_transformer .. . [NO] [NO] [NO]....... ..............[OKAY] [OKAY][OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] ...... utils[OKAY] .................. quantizer[YES] .................... [NO][OKAY] ....... [OKAY] quantizer .............. --------------------------------------------------[NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] utils....... ..................[OKAY] [YES] ...... [OKAY]utils .................. quantizer[YES] .................... [NO][OKAY] ....... [OKAY] quantizer .............. --------------------------------------------------[NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name................op name op name................ installed ................ installed ................installed.. .. installed..compatible compatiblecompatible--------------------------------------------------.. ----------------------------------------------------------------------------------------------------compatible -------------------------------------------------- cpu_adam ............... [YES] ......cpu_adam cpu_adamcpu_adam[OKAY] .............................. ............... [YES] [YES] [YES] ...... ............ [OKAY] fused_adam [OKAY][OKAY] ............. [NO] ....... [OKAY] fused_adamfused_lambfused_adam fused_adam ............. ............. .......................... [NO] [NO] [NO].......[NO]....... .......[OKAY][OKAY]....... [OKAY][OKAY] fused_lamb ............. fused_lambfused_lamb[NO] ................................. [NO] sparse_attn [OKAY] [NO]....... ............ .......[NO][OKAY] .......[OKAY] [OKAY] transformer ............ [NO] sparse_attn....... ............[OKAY] sparse_attn[NO]sparse_attn stochastic_transformer............................... [NO].[OKAY] [NO] ....... [NO] .......transformer[OKAY]....... ............[OKAY] transformer [OKAY] [NO] transformer ................... ............[OKAY] [NO] [NO] .............. stochastic_transformer[OKAY] [OKAY] . [NO] stochastic_transformer.......stochastic_transformer [OKAY] .. [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY] [OKAY][OKAY]---------------------------------------------------------------------------------------------------- op name-------------------------------------------------- --------------------------------------------------op name ................ op name................op nameinstalled ..................................installed installedcompatible.. installed --------------------------------------------------....compatible compatiblecompatible -------------------------------------------------- ----------------------------------------------------------------------------------------------------cpu_adam ............... [YES] ...... [OKAY] cpu_adam ...............cpu_adamcpu_adam [YES].............................. ......[YES][YES] fused_adam ...... [OKAY]...... .............[OKAY] [OKAY][NO] ....... [OKAY] fused_adamfused_lamb .............fused_adam............. fused_adam[NO][NO]............. ....................[NO] ....... [OKAY] [NO].......[OKAY] .......[OKAY] [OKAY] fused_lamb .............fused_lamb fused_lamb [NO]............. sparse_attn [NO]................................ .......[NO][NO][OKAY] .......[OKAY]....... [OKAY][OKAY] transformer ............ [NO] sparse_attn....... sparse_attn............[OKAY] ............sparse_attn[NO] stochastic_transformer [NO]............ ....... ........ [NO] [OKAY][NO][OKAY] .............. transformertransformer [OKAY][OKAY] ............ ............ [NO]transformer [NO] ....... ............ ....... [OKAY][NO][OKAY] ....... [OKAY] stochastic_transformer stochastic_transformer .stochastic_transformer. [NO][NO]. .......[NO]....... .......[OKAY][OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... quantizer[OKAY] .............. [NO] .......quantizer [OKAY].............. [NO] ....... --------------------------------------------------[OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY] [OKAY] quantizer ..............quantizer [NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']DeepSpeed general environment info: torch version .................... 1.8.1 torch install pathtorch cuda version .............................. 11.1 nvcc version ..................... 11.2['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed install path ...........torch version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'].................... deepspeed info1.8.1 ................... 0.4.2+bc17042, bc17042, big-sciencetorch cuda version ...............deepspeed wheel compiled w. ......11.1 torch 1.8, cuda 11.1nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name-------------------------------------------------- op name ................op name................op name installedinstalled................................ .. ..installed installed compatible ..compatible..-------------------------------------------------- compatible --------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY]cpu_adam cpu_adamcpu_adam .............................. ............... [YES] [YES] [YES]...... fused_adam ...... ......[OKAY][OKAY] ............. [OKAY][NO] ....... [OKAY] fused_lambfused_adam fused_adam .......................... fused_adam .............[NO] [NO] .................... [NO] ....... [NO][OKAY] ....... [OKAY] .......[OKAY] fused_lamb[OKAY] fused_lamb............. fused_lamb.............[NO] .............sparse_attn[NO]....... [OKAY]...................[NO] [NO]....... [OKAY]....... [OKAY][OKAY] transformer ............sparse_attn [NO]............ .......[NO] sparse_attn [OKAY]....... sparse_attn ............[OKAY] stochastic_transformer[NO]............ transformer........ [NO] ............[NO] [OKAY] .......[NO]....... transformer.......[OKAY] [OKAY] ............ [OKAY] transformer [NO] ................... [NO][OKAY] stochastic_transformer ....... [OKAY]. stochastic_transformer [NO] stochastic_transformer........ [OKAY][NO]. .......[NO] [OKAY]....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 ninjaninjaninjaninja ...................................................... ..................[OKAY][OKAY][OKAY] nvcc versionnvcc version .......................................... 11.211.2 [OKAY]------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------op nameop nameop name deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science ................................op name ................ installedinstalledinstalled ................ .... .. installed compatiblecompatible ..compatible ----------------------------------------------------------------------------------------------------compatible -------------------------------------------------- deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 -------------------------------------------------- cpu_adamcpu_adamcpu_adam ............... cpu_adam............... [YES]............... ............... [YES][YES] ............ [YES] [OKAY]......[OKAY] ......[OKAY] [OKAY] fused_adamfused_adam .............fused_adam............. [NO] fused_adam....................[NO] [OKAY][NO] .................... fused_lamb[OKAY].......[NO] ............. [OKAY]fused_lamb[NO]....... .......[OKAY]............. fused_lamb[NO] [OKAY]fused_lamb .................... .............[NO][OKAY] [NO] ....... .......[OKAY] [OKAY] sparse_attn ............ [NO] sparse_attn....... ............[OKAY] [NO] sparse_attn.......transformer sparse_attn ............[OKAY] ............ ............ [NO] transformer[NO][NO] ....... ................... ....... [OKAY][OKAY] [NO] [OKAY] .......transformer transformer[OKAY] ............ stochastic_transformer............ [NO]stochastic_transformer[NO] . ....... ........ [NO] [OKAY][NO].......[OKAY] [OKAY]....... [OKAY]stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... .......[NO] [NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] ...... quantizer[OKAY] .............. [NO] ....... [OKAY]quantizer .............. [NO] .......-------------------------------------------------- [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name................................ ................................installed installed installed.. installed .... compatible compatible ..compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- compatible -------------------------------------------------- cpu_adam ............... [YES] ......cpu_adam cpu_adam [OKAY] ...............cpu_adam............... ...............[YES][YES] [YES]............ ......[OKAY][OKAY] fused_adam [OKAY]............. [NO] .......fused_adam [OKAY]............. fused_adam .............fused_adam [NO][NO] fused_lamb............. ....... ....................[NO][OKAY] [OKAY][NO]....... fused_lamb ....... [OKAY] .............fused_lamb [OKAY] .............fused_lamb[NO] [NO]....... ............. .......[OKAY][NO] sparse_attn[OKAY]....... ............[OKAY] [NO] ....... [OKAY] sparse_attntransformer ........................ sparse_attn [NO][NO] sparse_attn .......................... ............[OKAY][NO][OKAY] [NO].......stochastic_transformer .......[OKAY]transformer . [OKAY] ............ [NO] transformer[NO] transformer....... ............................... [OKAY] [OKAY][NO][NO] .............. [OKAY][OKAY]stochastic_transformer . stochastic_transformer[NO] stochastic_transformer ........ .[NO][OKAY] [NO]....... .......[OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.async_io ............... [NO] ....... [NO] async_iotransformer_inference ................. [NO][NO] .............. [NO][OKAY] utils .................. [YES] ......transformer_inference [OKAY].. [NO] ....... quantizer[OKAY] .............. [NO] ....... utils[OKAY] .................. [YES] --------------------------------------------------...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 ninjaninjaninja ninja ...................................................... [OKAY].................. [OKAY] [OKAY] [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------op name ................op nameop name op name installed ................ ................................ .. installedinstalledinstalledcompatible .. ..-------------------------------------------------- .. compatiblecompatiblecompatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... cpu_adam[OKAY] cpu_adamcpu_adam............... ...............[YES]............... ......[YES][YES] [OKAY]...... ...... fused_adam [OKAY] [OKAY] ............. [NO] fused_adam....... .............[OKAY] [NO] fused_adam.......fused_lamb [OKAY] fused_adam............. ............. [NO].............[NO] fused_lamb ....... .......[OKAY][NO]............. [OKAY].......[NO] .......[OKAY] [OKAY]fused_lamb fused_lamb............. .............sparse_attn[NO] [NO]................... [NO] ....... [OKAY].......[OKAY]sparse_attn [OKAY] ............ [NO] ....... [OKAY] transformertransformer ........................ [NO][NO] .............. [OKAY]sparse_attn[OKAY]sparse_attn ........................ [NO]stochastic_transformer [NO]stochastic_transformer ....... ........[OKAY]. [NO][OKAY][NO]transformer ....... ....... [OKAY]transformer[OKAY] ............ ............[NO] [NO]....... .......[OKAY] [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: DeepSpeed general environment info: torch install path torch install path .............................. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version ........................................ 1.8.1torch version1.8.1 ....................torch cuda versiontorch cuda version 1.8.1.............................. 11.111.1 torch cuda version nvcc version nvcc version ............... ..................... ..................... 11.1 11.2 11.2 nvcc versiondeepspeed install pathdeepspeed install path ........................................... 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed info...........deepspeed info ...................................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. deepspeed wheel compiled w. ................... ...... ......0.4.2+bc17042, bc17042, big-science torch 1.8, cuda 11.1torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... ....... [NO][NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ...................DeepSpeed general environment info: 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch install pathtorch 1.8, cuda 11.1 ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path DeepSpeed general environment info:........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... torch install path0.4.2+bc17042, bc17042, big-science ...............deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO]transformer_inference ....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] quantizer...... ..............[OKAY] [NO] .......quantizer [OKAY].............. [NO] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference ..utils [NO].................. .......[YES] [OKAY]...... [OKAY] quantizerutils ................................ [NO][YES] ............. [OKAY][OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]async_io ............... --------------------------------------------------[NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 1.8.1 torch version torch cuda version.................... ...............1.8.1 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed info........... ................... 0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op nameop nameop name op name ................ ................................installed ................ installed .. installedinstalled .. compatible .... compatible --------------------------------------------------compatible compatible ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adam ............... cpu_adamcpu_adamcpu_adam[YES] .............................. ...... [YES][YES]...............[OKAY] ............[YES] [OKAY][OKAY]...... [OKAY] fused_adam ............. fused_adamfused_adam[NO] fused_adam ................................. .............[NO][NO][OKAY] .......[NO]....... [OKAY]fused_lamb .......[OKAY] .............[OKAY]fused_lamb fused_lamb[NO] .................................fused_lamb [NO][OKAY][NO] ............. ..............[NO] [OKAY][OKAY]....... [OKAY] sparse_attn ............ [NO] sparse_attn....... sparse_attnsparse_attn [OKAY] ........................ ............ [NO][NO][NO]transformer ................................. [OKAY][NO][OKAY][OKAY] ....... transformer transformer[OKAY]transformer .................................... [NO]stochastic_transformer[NO] [NO] ..................... [OKAY].[OKAY][OKAY] [NO] stochastic_transformer....... stochastic_transformerstochastic_transformer[OKAY] . [NO]. ........ [NO] [NO] [OKAY] ....... ....... [OKAY][OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** using world size: 256, data-parallel-size: 8, tensor-model-parallel size: 4, pipeline-model-parallel size: 8 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-data/codecarbon consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False data_impl ....................................... mmap data_parallel_size .............................. 8 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1161730.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 5 evidence_data_path .............................. None exit_duration_in_mins ........................... 110 exit_interval ................................... None ffn_hidden_size ................................. 20480 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False global_batch_size ............................... 2048 hidden_dropout .................................. 0.1 hidden_size ..................................... 16384 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 512 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 1 log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ 126953125 lr_decay_style .................................. cosine lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 216320 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 32 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 32 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 8 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['16', '16', '6_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 42 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-data/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 300000000 use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 256 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 16 to global batch size 2048 with batch size increments 16 over 6000000 samples. > building GPT2BPETokenizer tokenizer ... /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... DeepSpeed general environment info:['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science torch install pathdeepspeed wheel compiled w. ..................... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop name op name................................ ................ installed................ installed installed.. installed .. .. compatible.. compatible compatible--------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adamcpu_adam[YES] ................................................... [YES][YES] [OKAY] [YES]...... ...... ......[OKAY][OKAY] [OKAY] fused_adam ............. [NO] fused_adamfused_adam.......fused_adam ............. ............. .............[OKAY] [NO] [NO][NO] .......fused_lamb.............. [OKAY].............[OKAY] [OKAY] [NO]fused_lamb .......fused_lamb............. fused_lamb [OKAY] ............. [NO]............. [NO].......[NO] ..............[OKAY] [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attntransformersparse_attnsparse_attn ........................ [NO] ............................... [NO][OKAY][NO][NO] ....... ....... transformer.......[OKAY] [OKAY]............[OKAY] stochastic_transformer[NO] transformer.......transformer. ............ ............ [NO][OKAY] [NO][NO]....... .......stochastic_transformer[OKAY]....... [OKAY]. [OKAY] [NO] .......stochastic_transformer [OKAY]stochastic_transformer . [NO]. .......[NO] [OKAY]....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja ninja.................. ..................[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- op name op name................ ................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY] [OKAY] fused_adamfused_adam .......................... [NO][NO] .............. [OKAY][OKAY] fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] sparse_attnsparse_attn ............ ............[NO] [NO]....... .......[OKAY] [OKAY] transformertransformer ............ ............[NO] [NO]....... .......[OKAY] [OKAY] stochastic_transformer stochastic_transformer . [NO]. ....... [NO][OKAY] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaJIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja .................. ....................................[OKAY].................. [OKAY][OKAY][OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op name op name op nameop name................................ ................ ................installed installedinstalledinstalled.. ..compatible.... --------------------------------------------------compatiblecompatiblecompatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] cpu_adam......cpu_adam ...............cpu_adam[OKAY]............... [YES][YES] ............... ............ [OKAY] [OKAY] [YES] fused_adam ................... [OKAY][NO] ....... [OKAY]fused_adam fused_adam ..........................fused_lamb .............[NO][NO] [NO]....... ....... ....... fused_adam[OKAY] [OKAY] [OKAY] .............fused_lamb fused_lamb [NO] ............. ............. .......[NO] sparse_attn [NO][OKAY] ....... ............ [OKAY]....... [NO]fused_lamb[OKAY] ....... .............[OKAY] [NO] .......transformer sparse_attn............[OKAY] sparse_attn............[NO] ...................[NO] [NO] [OKAY] ....... ....... [OKAY][OKAY] stochastic_transformer transformertransformer . sparse_attn ............ [NO] ........................[NO] [NO].............. .......[OKAY][NO][OKAY] [OKAY] stochastic_transformer.......stochastic_transformer [OKAY] . .[NO] [NO]transformer....... .......[OKAY] [OKAY]............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name op name ................op name ................ installed ................installed .................. .. installedcompatiblecompatibleinstalled ..----------------------------------------------------------------------------------------------------.. compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam .............................. cpu_adam [YES]cpu_adam [YES] ...... .............................. ......[OKAY] [YES][YES][OKAY] ............ [OKAY][OKAY] fused_adam ............. [NO] fused_adam....... fused_adamfused_adam .............[OKAY]............. .............[NO] [NO]....... [NO] .......fused_lamb [OKAY] ....... **** Git info for Megatron: git_hash=unknown git_branch=unknown **** [OKAY]............. [OKAY]fused_lamb[NO] fused_lamb.................... .............fused_lamb[NO][OKAY] [NO]....... ............. [OKAY].......[NO] [OKAY]....... [OKAY] sparse_attn ............ sparse_attn[NO] ................... sparse_attn[OKAY]sparse_attn[NO] ........................transformer....... [NO][OKAY]............[NO] [NO].............. transformer....... [OKAY] ............[OKAY] [OKAY][NO] transformer.......transformer stochastic_transformer[OKAY]........................ [NO][NO] . stochastic_transformer ....... [NO]....... [OKAY]. .......[OKAY] [NO][OKAY] ....... [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninja ninja.................. ninja.................. ....................................[OKAY] [OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op nameop name op name................................ op nameinstalled ................ installed..installed ....................compatible compatiblecompatible installed -------------------------------------------------- ---------------------------------------------------------------------------------------------------- .. compatible --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. cpu_adam ............... cpu_adam[YES] .....................cpu_adam [YES][OKAY]............... ...... cpu_adam[YES][OKAY] async_io ............... [NO] ....... [NO] fused_adam...... ............................[OKAY] [NO] .......[YES] [OKAY]fused_adam ...... .............[OKAY] fused_lamb transformer_inference .. [NO] ....... [OKAY] [NO] fused_adam .......................... .......[NO] [NO] [OKAY] .............. fused_lambfused_adam [OKAY] [OKAY]............. utils .................. [YES] ...... [OKAY] ............. [NO] fused_lamb.......[NO] ....................[OKAY] sparse_attn[NO][OKAY] quantizer .............. [NO] ....... [OKAY] ............ .......fused_lamb[NO] [OKAY].................... -------------------------------------------------- [OKAY][NO] sparse_attn ............ [NO]transformer .......................... [NO][OKAY] [OKAY] ....... sparse_attn [OKAY] transformer ........................ stochastic_transformer [NO][NO]. .............. [NO] [OKAY]sparse_attn [OKAY] transformer.......stochastic_transformer............ [OKAY]............[NO] . [NO] ....... [OKAY].......[NO] [OKAY]....... [OKAY]transformer stochastic_transformer............ . [NO][NO] ....... .......[OKAY] [OKAY] stochastic_transformer . [NO] ....... [OKAY] > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688)  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY]utils .................. [YES] ...... [OKAY]utils .................. [YES] quantizer...... ..............[OKAY] [NO] ....... [OKAY] quantizer .............. [NO]-------------------------------------------------- ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... [OKAY][OKAY].................. [OKAY]----------------------------------------------------------------------------------------------------[OKAY] --------------------------------------------------op nameop name --------------------------------------------------................op name................ ................installedop nameinstalled ..installed.................. compatiblecompatible .. installed --------------------------------------------------compatible .. --------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... [YES]cpu_adam............... ......[YES] ............... [OKAY] ...... [YES] [OKAY]cpu_adam ...... [OKAY] ...............fused_adam [YES].............fused_adam [NO]fused_adam .................... [OKAY][NO]...... .................... fused_lamb[NO][OKAY] ....................[OKAY] fused_lamb[NO][OKAY] .................... fused_lamb[OKAY][NO] .................... [NO][OKAY] ....... [OKAY] fused_adam ............. [NO] .......sparse_attn ............ sparse_attn[NO]sparse_attn ............................... [OKAY] [NO] [NO][OKAY] .......transformer ...................[OKAY] fused_lamb[OKAY] [NO]transformer .................... ............[NO][OKAY]transformer .......[NO]............ .......stochastic_transformer[NO] [OKAY]........[OKAY] [OKAY][NO] stochastic_transformer....... stochastic_transformer[OKAY] . .[NO] [NO]....... .......[OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninjaninjaninja .................. ...................................................... [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name op nameop name ................ ................ ................................installedinstalled installed..installed.. .... compatible compatible compatiblecompatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adamcpu_adam ...............cpu_adam.............................. [YES] ...............[YES] [YES] ...... ......[YES] ...... [OKAY]......[OKAY] [OKAY][OKAY] fused_adam .............fused_adam fused_adam fused_adam[NO] ............. ............. .......[NO] ............. [NO][OKAY] ....... [NO].......[OKAY] fused_lamb[OKAY]....... .............[OKAY]fused_lamb [NO]fused_lamb............. fused_lamb ....... .............[NO] ............. [OKAY][NO]....... [NO] ....... ....... [OKAY] [OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attnsparse_attn transformersparse_attn............ ............[NO]........................ [NO] .......[NO][NO] [OKAY] ..................... [OKAY]transformer[OKAY] [OKAY] ............ transformer [NO]stochastic_transformer ............transformer....... . [NO] ............ [OKAY] [NO][NO] ....... ....... ....... [OKAY] stochastic_transformer [OKAY][OKAY] stochastic_transformer. [NO]stochastic_transformer . ....... [OKAY].[NO] [NO]....... .......[OKAY] [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] ....... utils[OKAY] .................. [YES] ...... [OKAY]utils .................. [YES] quantizer...... [OKAY].............. [NO] ....... quantizer[OKAY] .............. [NO] .......-------------------------------------------------- [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils utils.................. ..................[YES] [YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................. ...................................................... [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name................ op nameop name................ ................installed................installed installedinstalled.. .... .. compatiblecompatible compatible compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam ............... cpu_adam .............................. [YES] [YES][YES] ..................... ............ [OKAY] [YES] [OKAY][OKAY] ...... [OKAY] fused_adam fused_adamfused_adam............. .............fused_adam .............[NO] ............. [NO] [NO].......[NO]....... [OKAY].......[OKAY]....... [OKAY][OKAY] fused_lamb fused_lamb fused_lamb ............. .............fused_lamb .............[NO][NO] ............. [NO] ..............[NO] [OKAY] [OKAY].............. [OKAY][OKAY] sparse_attnsparse_attnsparse_attn ............ sparse_attn........................ ............ [NO] [NO][NO][NO]....... ..............[OKAY]....... [OKAY][OKAY][OKAY] transformer transformer transformer ............ ............transformer............ [NO][NO]............[NO] ....... .......[NO] .......[OKAY] .......[OKAY][OKAY] [OKAY]stochastic_transformer stochastic_transformerstochastic_transformer . stochastic_transformer. [NO][NO]. . ....... ....... [NO] [NO][OKAY] [OKAY] .............. [OKAY][OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.transformer_inference .. [NO] ....... [OKAY] utils .................. [YES]async_io ...... ...............[OKAY] async_io[NO] ...................... quantizer[NO][NO] ..................... [NO][NO] ....... [OKAY] --------------------------------------------------transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] utils....... ..................[OKAY] [YES] ...... [OKAY] utils .................. [YES]quantizer .................... [OKAY][NO] ....... [OKAY] quantizer .............. [NO]-------------------------------------------------- ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 DeepSpeed general environment info: deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science DeepSpeed general environment info:deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... utils[OKAY] .................. utils .................. [YES] ...... [OKAY] quantizer[YES] .................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] transformer_inference....... ..[NO] [NO] ....... [OKAY] utils .................. transformer_inference[YES] ........ [OKAY][NO] ....... [OKAY] quantizer .............. [NO] utils....... ..................[OKAY] [YES] ......-------------------------------------------------- [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1 torch cuda version ...............torch cuda version ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop name op nameop name ................ ................................ ................ installedinstalled installed installed ...... ..compatiblecompatiblecompatible compatible------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- cpu_adamcpu_adam cpu_adam ...............cpu_adam ...............[YES]............... .....................[YES][YES] [OKAY]......[YES]...... [OKAY] ...... [OKAY] [OKAY] fused_adam ............. fused_adam[NO] fused_adamfused_adam............. ....... ............. ............. [NO][OKAY] [NO][NO]....... fused_lamb.......[OKAY]....... ............. [OKAY] [OKAY][NO] fused_lamb .................... fused_lamb fused_lamb[OKAY][NO] ............. ............. ....... [NO] [NO] [OKAY] ....... ....... [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............transformer [NO]............ sparse_attnsparse_attn ....... [NO] ............ ............[OKAY] ....... [NO] [NO] [OKAY] transformer .............. ............[OKAY][OKAY] [NO]stochastic_transformer ....... [OKAY]. transformer transformer [NO] ............stochastic_transformer ................... [NO][OKAY][NO] . ....... ....... [NO] [OKAY] [OKAY]....... [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science DeepSpeed general environment info:deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja torch cuda version ............... 11.1 nvcc version ..................... 11.2 -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 > setting codecarbon ... ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY] ----------------------------------------------------------------------------------------------------[OKAY] op nameop name---------------------------------------------------------------------------------------------------- ................................op name op nameinstalled installed .................................... installedinstalledcompatiblecompatible .. ..-------------------------------------------------- -------------------------------------------------- compatible compatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... cpu_adam[YES] .....................cpu_adam cpu_adam [YES][OKAY]............... ......[YES]............... [OKAY]......[YES] [OKAY]...... fused_adam [OKAY]............. fused_adam[NO] .................... fused_adam[NO][OKAY] .................... fused_adamfused_lamb [OKAY] [NO]............. .......[NO]............. fused_lamb ....... [OKAY] .............[NO][OKAY] [NO] .......fused_lamb....... [OKAY][OKAY]............. [NO] ....... sparse_attn[OKAY]fused_lamb ......................... [NO] [NO]sparse_attn....... ............[OKAY]....... sparse_attn [NO] transformer[OKAY] ........................ ....... [NO][NO][OKAY] ....... .......[OKAY]transformer [OKAY]............sparse_attn stochastic_transformer [NO] ............ . .......transformer [NO][NO]............ [OKAY] .......[NO]....... [OKAY]stochastic_transformer[OKAY] ....... .[OKAY] transformer [NO] ...................stochastic_transformer [OKAY] [NO] ........ [NO][OKAY] ....... [OKAY]stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils .................. [YES]utils ........................ [YES][OKAY] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version .................... torch version1.8.1 .................... torch cuda version 1.8.1............... 11.1torch cuda version nvcc version............... .....................11.1 11.2nvcc version deepspeed install path..................... ...........11.2 deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']................... deepspeed info0.4.2+bc17042, bc17042, big-science ...................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name op name................ ................op name ................ installedinstalled................ installed .. installed ....compatible compatible..compatible-------------------------------------------------- -------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam ...............cpu_adam............... cpu_adam[YES] ............... [YES]...............[YES]...... ...... [OKAY][YES] ...... [OKAY]......[OKAY] [OKAY] fused_adam ............. [NO] fused_adamfused_adam.......fused_adam ............. ..........................[OKAY] [NO] [NO][NO]....... fused_lamb..............[OKAY] .............[OKAY] [OKAY]fused_lamb[NO] .............fused_lamb....... [NO].............[OKAY]fused_lamb .......[NO]............. .......[NO][OKAY] [OKAY] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attnsparse_attn transformer sparse_attn........................ [NO] ........................ [NO] ....... [NO][NO] ....... [OKAY] ....... .......[OKAY][OKAY] transformer[OKAY] ............transformer stochastic_transformertransformer [NO] ............ ....................[NO] [NO] [NO][OKAY].............. .......[OKAY][OKAY] stochastic_transformer[OKAY] stochastic_transformer. stochastic_transformer[NO]. .......[NO] . [OKAY] ....... [NO] [OKAY]....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... utils[OKAY] .................. [YES] ...... [OKAY]utils .................. quantizer[YES] .................... [NO][OKAY] ....... [OKAY] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1DeepSpeed general environment info: nvcc version ..................... 11.2 deepspeed install path ...........torch install path ...............['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed wheel compiled w. ...... torch versiontorch 1.8, cuda 11.1 .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 1.8.1 torch versiontorch cuda version ................................... 1.8.111.1 nvcc versiontorch cuda version .................................... 11.211.1 deepspeed install path nvcc version........... ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.2 deepspeed infodeepspeed install path .............................. 0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO]transformer_inference ....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install pathDeepSpeed general environment info: ............... torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ...............torch version 11.1.................... nvcc version1.8.1 ..................... 11.2torch cuda version deepspeed install path............... ...........11.1 nvcc version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ..................... deepspeed info11.2 ...................deepspeed install path 0.4.2+bc17042, bc17042, big-science........... deepspeed wheel compiled w. ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...... deepspeed infotorch 1.8, cuda 11.1 ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference utils.. ..................[NO] [YES]....... ......[OKAY] [OKAY] quantizerutils ................................ [NO][YES] ............. [OKAY][OKAY] --------------------------------------------------quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 1.8.1 torch versiontorch cuda version ................................... 1.8.111.1 nvcc versiontorch cuda version .................................... 11.211.1 deepspeed install pathnvcc version ................................ 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install pathdeepspeed info .............................. 0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version ...............torch cuda version 11.1............... nvcc version11.1 .....................nvcc version 11.2..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ................... deepspeed info 0.4.2+bc17042, bc17042, big-science................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................ ................................................installed installedinstalledinstalled.. ....compatible.. -------------------------------------------------- compatiblecompatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adam ............... [YES] cpu_adam......cpu_adam cpu_adam ...............[OKAY] ............... ............... [YES][YES][YES] ...... ...... ...... [OKAY]fused_adam [OKAY] [OKAY] ............. [NO] ....... [OKAY] fused_adam fused_lamb............. fused_adam............. fused_adam............. [NO][NO] ............. [NO].............. .......[NO][OKAY][OKAY] [OKAY] ....... fused_lamb [OKAY]............. fused_lamb[NO] fused_lamb ............. .......[NO].............sparse_attn ....... [NO] [OKAY]............ [OKAY] [NO] ....... .......[OKAY] [OKAY] transformer ............sparse_attn [NO] sparse_attn................... ............sparse_attn [NO][OKAY] [NO] ....... ............ .......[OKAY]stochastic_transformer [NO][OKAY] .......transformer.transformer [NO][OKAY]........................ ....... [OKAY] [NO]transformer[NO] .......................... [OKAY][NO][OKAY] ....... [OKAY] stochastic_transformerstochastic_transformer ..stochastic_transformer [NO][NO] ........ [NO].......[OKAY] .......[OKAY] [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name op name ................op name ................................ installed installed installed ...................... compatibleinstalledcompatible compatible .. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- compatible -------------------------------------------------- cpu_adamcpu_adam ...............cpu_adam cpu_adam[YES]............... ............... ...... ...............[YES][OKAY][YES] [YES]............ ......[OKAY][OKAY] [OKAY] fused_adam ............. [NO] ....... [OKAY]fused_adam fused_adam.............fused_adam fused_lamb .............[NO]............. .................... [NO][OKAY] [NO] [NO].............. fused_lamb [OKAY].......[OKAY] [OKAY]............. fused_lamb[NO]fused_lamb ................................. [NO][OKAY][NO] ..............sparse_attn [OKAY][OKAY]............ [NO] ....... sparse_attn[OKAY] ............ [NO] transformer....... sparse_attn............[OKAY]sparse_attn **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ............ [NO] ............ transformer[NO] .......[NO]....... ............[OKAY]....... [OKAY][OKAY] [NO] .......transformertransformer stochastic_transformer [OKAY]............ ............ . [NO] [NO][NO].......stochastic_transformer ....... ....... .[OKAY] [OKAY] [OKAY][NO]stochastic_transformer ....... [OKAY]. stochastic_transformer [NO] ........ [OKAY][NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja .................................... .................. ..................[OKAY][OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name op name................................op name installed................installed................ .. ..installed installedcompatiblecompatible ....---------------------------------------------------------------------------------------------------- compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] [YES]cpu_adamcpu_adam...... ...... [OKAY]..............................[OKAY] [YES][YES] ............ [OKAY][OKAY] fused_adamfused_adam .......................... [NO][NO] fused_adam..............fused_adam [OKAY]..........................[OKAY] [NO][NO] ..............fused_lamb fused_lamb[OKAY][OKAY]............. .............[NO] fused_lamb.......[NO]fused_lamb ....... .............[OKAY] .............[OKAY] [NO][NO] .............. [OKAY][OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY]sparse_attn sparse_attn ............transformer............ transformer[NO]............[NO] [NO] ............ .............. ....... [OKAY][NO] [OKAY] [OKAY] ....... transformer [OKAY]transformer............ stochastic_transformer ............[NO] .[NO] .......stochastic_transformer [NO] ....... [OKAY]........ [OKAY][OKAY] [NO]stochastic_transformer ....... [OKAY].stochastic_transformer [NO]. .......[NO] [OKAY]....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] .......utils [OKAY].................. [YES] ...... [OKAY]utils .................. [YES] ......quantizer [OKAY].............. [NO] .......quantizer [OKAY].............. [NO] ....... --------------------------------------------------[OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] utils....... [OKAY].................. [YES] ...... [OKAY] utils quantizer.................. ..............[YES] [NO]...... .......[OKAY] [OKAY] --------------------------------------------------quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ...... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version .....................nvcc version 11.2..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 > initializing torch distributed ... /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninja --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name op nameop name................ ................ installed................ ................installed .. installed ..installed..compatible compatiblecompatible..-------------------------------------------------- --------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ......cpu_adam cpu_adamcpu_adam [OKAY] ............... .............................. [YES][YES] [YES] ...... ...... fused_adam...... [OKAY] [OKAY].............[OKAY] [NO] ....... [OKAY] fused_adam ............. fused_lamb[NO]fused_adam fused_adam ............. .................... ............. [NO] [OKAY].......[NO][NO] [OKAY] ....... fused_lamb....... [OKAY]............. [OKAY] [NO] .......fused_lamb [OKAY]fused_lambsparse_attn ............. .........................[NO] [NO][NO]....... ..............[OKAY] sparse_attn[OKAY][OKAY] ............ transformer[NO] ................... [NO][OKAY] ....... sparse_attntransformer[OKAY] ............sparse_attn ............ [NO] ............ stochastic_transformer[NO]....... [NO] [OKAY] .............. . [OKAY][OKAY]stochastic_transformer [NO] transformer........ [OKAY]transformer[NO]............ ...................[NO] [NO] [OKAY] ....... ....... [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- > setting tensorboard ...  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 8 > setting random seeds to 42 ... [2021-09-24 02:11:21,021] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' >>> done with dataset index builder. Compilation time: 0.307 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 27.029 seconds time to initialize megatron (seconds): 68.551 [after megatron is initialized] datetime: 2021-09-24 02:11:48 building GPT model ... [2021-09-24 02:11:48,760] [INFO] [utils.py:680:see_memory_usage] Before Building Model /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved warnings.warn( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved warnings.warn( [2021-09-24 02:11:48,762] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-09-24 02:11:48,763] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 37.77 GB, percent = 20.2% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=0, data=4, model=0): 16, ProcessCoord(pipe=0, data=4, model=1): 17, ProcessCoord(pipe=0, data=4, model=2): 18, ProcessCoord(pipe=0, data=4, model=3): 19, ProcessCoord(pipe=0, data=5, model=0): 20, ProcessCoord(pipe=0, data=5, model=1): 21, ProcessCoord(pipe=0, data=5, model=2): 22, ProcessCoord(pipe=0, data=5, model=3): 23, ProcessCoord(pipe=0, data=6, model=0): 24, ProcessCoord(pipe=0, data=6, model=1): 25, ProcessCoord(pipe=0, data=6, model=2): 26, ProcessCoord(pipe=0, data=6, model=3): 27, ProcessCoord(pipe=0, data=7, model=0): 28, ProcessCoord(pipe=0, data=7, model=1): 29, ProcessCoord(pipe=0, data=7, model=2): 30, ProcessCoord(pipe=0, data=7, model=3): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=0, model=1): 33, ProcessCoord(pipe=1, data=0, model=2): 34, ProcessCoord(pipe=1, data=0, model=3): 35, ProcessCoord(pipe=1, data=1, model=0): 36, ProcessCoord(pipe=1, data=1, model=1): 37, ProcessCoord(pipe=1, data=1, model=2): 38, ProcessCoord(pipe=1, data=1, model=3): 39, ProcessCoord(pipe=1, data=2, model=0): 40, ProcessCoord(pipe=1, data=2, model=1): 41, ProcessCoord(pipe=1, data=2, model=2): 42, ProcessCoord(pipe=1, data=2, model=3): 43, ProcessCoord(pipe=1, data=3, model=0): 44, ProcessCoord(pipe=1, data=3, model=1): 45, ProcessCoord(pipe=1, data=3, model=2): 46, ProcessCoord(pipe=1, data=3, model=3): 47, ProcessCoord(pipe=1, data=4, model=0): 48, ProcessCoord(pipe=1, data=4, model=1): 49, ProcessCoord(pipe=1, data=4, model=2): 50, ProcessCoord(pipe=1, data=4, model=3): 51, ProcessCoord(pipe=1, data=5, model=0): 52, ProcessCoord(pipe=1, data=5, model=1): 53, ProcessCoord(pipe=1, data=5, model=2): 54, ProcessCoord(pipe=1, data=5, model=3): 55, ProcessCoord(pipe=1, data=6, model=0): 56, ProcessCoord(pipe=1, data=6, model=1): 57, ProcessCoord(pipe=1, data=6, model=2): 58, ProcessCoord(pipe=1, data=6, model=3): 59, ProcessCoord(pipe=1, data=7, model=0): 60, ProcessCoord(pipe=1, data=7, model=1): 61, ProcessCoord(pipe=1, data=7, model=2): 62, ProcessCoord(pipe=1, data=7, model=3): 63, ProcessCoord(pipe=2, data=0, model=0): 64, ProcessCoord(pipe=2, data=0, model=1): 65, ProcessCoord(pipe=2, data=0, model=2): 66, ProcessCoord(pipe=2, data=0, model=3): 67, ProcessCoord(pipe=2, data=1, model=0): 68, ProcessCoord(pipe=2, data=1, model=1): 69, ProcessCoord(pipe=2, data=1, model=2): 70, ProcessCoord(pipe=2, data=1, model=3): 71, ProcessCoord(pipe=2, data=2, model=0): 72, ProcessCoord(pipe=2, data=2, model=1): 73, ProcessCoord(pipe=2, data=2, model=2): 74, ProcessCoord(pipe=2, data=2, model=3): 75, ProcessCoord(pipe=2, data=3, model=0): 76, ProcessCoord(pipe=2, data=3, model=1): 77, ProcessCoord(pipe=2, data=3, model=2): 78, ProcessCoord(pipe=2, data=3, model=3): 79, ProcessCoord(pipe=2, data=4, model=0): 80, ProcessCoord(pipe=2, data=4, model=1): 81, ProcessCoord(pipe=2, data=4, model=2): 82, ProcessCoord(pipe=2, data=4, model=3): 83, ProcessCoord(pipe=2, data=5, model=0): 84, ProcessCoord(pipe=2, data=5, model=1): 85, ProcessCoord(pipe=2, data=5, model=2): 86, ProcessCoord(pipe=2, data=5, model=3): 87, ProcessCoord(pipe=2, data=6, model=0): 88, ProcessCoord(pipe=2, data=6, model=1): 89, ProcessCoord(pipe=2, data=6, model=2): 90, ProcessCoord(pipe=2, data=6, model=3): 91, ProcessCoord(pipe=2, data=7, model=0): 92, ProcessCoord(pipe=2, data=7, model=1): 93, ProcessCoord(pipe=2, data=7, model=2): 94, ProcessCoord(pipe=2, data=7, model=3): 95, ProcessCoord(pipe=3, data=0, model=0): 96, ProcessCoord(pipe=3, data=0, model=1): 97, ProcessCoord(pipe=3, data=0, model=2): 98, ProcessCoord(pipe=3, data=0, model=3): 99, ProcessCoord(pipe=3, data=1, model=0): 100, ProcessCoord(pipe=3, data=1, model=1): 101, ProcessCoord(pipe=3, data=1, model=2): 102, ProcessCoord(pipe=3, data=1, model=3): 103, ProcessCoord(pipe=3, data=2, model=0): 104, ProcessCoord(pipe=3, data=2, model=1): 105, ProcessCoord(pipe=3, data=2, model=2): 106, ProcessCoord(pipe=3, data=2, model=3): 107, ProcessCoord(pipe=3, data=3, model=0): 108, ProcessCoord(pipe=3, data=3, model=1): 109, ProcessCoord(pipe=3, data=3, model=2): 110, ProcessCoord(pipe=3, data=3, model=3): 111, ProcessCoord(pipe=3, data=4, model=0): 112, ProcessCoord(pipe=3, data=4, model=1): 113, ProcessCoord(pipe=3, data=4, model=2): 114, ProcessCoord(pipe=3, data=4, model=3): 115, ProcessCoord(pipe=3, data=5, model=0): 116, ProcessCoord(pipe=3, data=5, model=1): 117, ProcessCoord(pipe=3, data=5, model=2): 118, ProcessCoord(pipe=3, data=5, model=3): 119, ProcessCoord(pipe=3, data=6, model=0): 120, ProcessCoord(pipe=3, data=6, model=1): 121, ProcessCoord(pipe=3, data=6, model=2): 122, ProcessCoord(pipe=3, data=6, model=3): 123, ProcessCoord(pipe=3, data=7, model=0): 124, ProcessCoord(pipe=3, data=7, model=1): 125, ProcessCoord(pipe=3, data=7, model=2): 126, ProcessCoord(pipe=3, data=7, model=3): 127, ProcessCoord(pipe=4, data=0, model=0): 128, ProcessCoord(pipe=4, data=0, model=1): 129, ProcessCoord(pipe=4, data=0, model=2): 130, ProcessCoord(pipe=4, data=0, model=3): 131, ProcessCoord(pipe=4, data=1, model=0): 132, ProcessCoord(pipe=4, data=1, model=1): 133, ProcessCoord(pipe=4, data=1, model=2): 134, ProcessCoord(pipe=4, data=1, model=3): 135, ProcessCoord(pipe=4, data=2, model=0): 136, ProcessCoord(pipe=4, data=2, model=1): 137, ProcessCoord(pipe=4, data=2, model=2): 138, ProcessCoord(pipe=4, data=2, model=3): 139, ProcessCoord(pipe=4, data=3, model=0): 140, ProcessCoord(pipe=4, data=3, model=1): 141, ProcessCoord(pipe=4, data=3, model=2): 142, ProcessCoord(pipe=4, data=3, model=3): 143, ProcessCoord(pipe=4, data=4, model=0): 144, ProcessCoord(pipe=4, data=4, model=1): 145, ProcessCoord(pipe=4, data=4, model=2): 146, ProcessCoord(pipe=4, data=4, model=3): 147, ProcessCoord(pipe=4, data=5, model=0): 148, ProcessCoord(pipe=4, data=5, model=1): 149, ProcessCoord(pipe=4, data=5, model=2): 150, ProcessCoord(pipe=4, data=5, model=3): 151, ProcessCoord(pipe=4, data=6, model=0): 152, ProcessCoord(pipe=4, data=6, model=1): 153, ProcessCoord(pipe=4, data=6, model=2): 154, ProcessCoord(pipe=4, data=6, model=3): 155, ProcessCoord(pipe=4, data=7, model=0): 156, ProcessCoord(pipe=4, data=7, model=1): 157, ProcessCoord(pipe=4, data=7, model=2): 158, ProcessCoord(pipe=4, data=7, model=3): 159, ProcessCoord(pipe=5, data=0, model=0): 160, ProcessCoord(pipe=5, data=0, model=1): 161, ProcessCoord(pipe=5, data=0, model=2): 162, ProcessCoord(pipe=5, data=0, model=3): 163, ProcessCoord(pipe=5, data=1, model=0): 164, ProcessCoord(pipe=5, data=1, model=1): 165, ProcessCoord(pipe=5, data=1, model=2): 166, ProcessCoord(pipe=5, data=1, model=3): 167, ProcessCoord(pipe=5, data=2, model=0): 168, ProcessCoord(pipe=5, data=2, model=1): 169, ProcessCoord(pipe=5, data=2, model=2): 170, ProcessCoord(pipe=5, data=2, model=3): 171, ProcessCoord(pipe=5, data=3, model=0): 172, ProcessCoord(pipe=5, data=3, model=1): 173, ProcessCoord(pipe=5, data=3, model=2): 174, ProcessCoord(pipe=5, data=3, model=3): 175, ProcessCoord(pipe=5, data=4, model=0): 176, ProcessCoord(pipe=5, data=4, model=1): 177, ProcessCoord(pipe=5, data=4, model=2): 178, ProcessCoord(pipe=5, data=4, model=3): 179, ProcessCoord(pipe=5, data=5, model=0): 180, ProcessCoord(pipe=5, data=5, model=1): 181, ProcessCoord(pipe=5, data=5, model=2): 182, ProcessCoord(pipe=5, data=5, model=3): 183, ProcessCoord(pipe=5, data=6, model=0): 184, ProcessCoord(pipe=5, data=6, model=1): 185, ProcessCoord(pipe=5, data=6, model=2): 186, ProcessCoord(pipe=5, data=6, model=3): 187, ProcessCoord(pipe=5, data=7, model=0): 188, ProcessCoord(pipe=5, data=7, model=1): 189, ProcessCoord(pipe=5, data=7, model=2): 190, ProcessCoord(pipe=5, data=7, model=3): 191, ProcessCoord(pipe=6, data=0, model=0): 192, ProcessCoord(pipe=6, data=0, model=1): 193, ProcessCoord(pipe=6, data=0, model=2): 194, ProcessCoord(pipe=6, data=0, model=3): 195, ProcessCoord(pipe=6, data=1, model=0): 196, ProcessCoord(pipe=6, data=1, model=1): 197, ProcessCoord(pipe=6, data=1, model=2): 198, ProcessCoord(pipe=6, data=1, model=3): 199, ProcessCoord(pipe=6, data=2, model=0): 200, ProcessCoord(pipe=6, data=2, model=1): 201, ProcessCoord(pipe=6, data=2, model=2): 202, ProcessCoord(pipe=6, data=2, model=3): 203, ProcessCoord(pipe=6, data=3, model=0): 204, ProcessCoord(pipe=6, data=3, model=1): 205, ProcessCoord(pipe=6, data=3, model=2): 206, ProcessCoord(pipe=6, data=3, model=3): 207, ProcessCoord(pipe=6, data=4, model=0): 208, ProcessCoord(pipe=6, data=4, model=1): 209, ProcessCoord(pipe=6, data=4, model=2): 210, ProcessCoord(pipe=6, data=4, model=3): 211, ProcessCoord(pipe=6, data=5, model=0): 212, ProcessCoord(pipe=6, data=5, model=1): 213, ProcessCoord(pipe=6, data=5, model=2): 214, ProcessCoord(pipe=6, data=5, model=3): 215, ProcessCoord(pipe=6, data=6, model=0): 216, ProcessCoord(pipe=6, data=6, model=1): 217, ProcessCoord(pipe=6, data=6, model=2): 218, ProcessCoord(pipe=6, data=6, model=3): 219, ProcessCoord(pipe=6, data=7, model=0): 220, ProcessCoord(pipe=6, data=7, model=1): 221, ProcessCoord(pipe=6, data=7, model=2): 222, ProcessCoord(pipe=6, data=7, model=3): 223, ProcessCoord(pipe=7, data=0, model=0): 224, ProcessCoord(pipe=7, data=0, model=1): 225, ProcessCoord(pipe=7, data=0, model=2): 226, ProcessCoord(pipe=7, data=0, model=3): 227, ProcessCoord(pipe=7, data=1, model=0): 228, ProcessCoord(pipe=7, data=1, model=1): 229, ProcessCoord(pipe=7, data=1, model=2): 230, ProcessCoord(pipe=7, data=1, model=3): 231, ProcessCoord(pipe=7, data=2, model=0): 232, ProcessCoord(pipe=7, data=2, model=1): 233, ProcessCoord(pipe=7, data=2, model=2): 234, ProcessCoord(pipe=7, data=2, model=3): 235, ProcessCoord(pipe=7, data=3, model=0): 236, ProcessCoord(pipe=7, data=3, model=1): 237, ProcessCoord(pipe=7, data=3, model=2): 238, ProcessCoord(pipe=7, data=3, model=3): 239, ProcessCoord(pipe=7, data=4, model=0): 240, ProcessCoord(pipe=7, data=4, model=1): 241, ProcessCoord(pipe=7, data=4, model=2): 242, ProcessCoord(pipe=7, data=4, model=3): 243, ProcessCoord(pipe=7, data=5, model=0): 244, ProcessCoord(pipe=7, data=5, model=1): 245, ProcessCoord(pipe=7, data=5, model=2): 246, ProcessCoord(pipe=7, data=5, model=3): 247, ProcessCoord(pipe=7, data=6, model=0): 248, ProcessCoord(pipe=7, data=6, model=1): 249, ProcessCoord(pipe=7, data=6, model=2): 250, ProcessCoord(pipe=7, data=6, model=3): 251, ProcessCoord(pipe=7, data=7, model=0): 252, ProcessCoord(pipe=7, data=7, model=1): 253, ProcessCoord(pipe=7, data=7, model=2): 254, ProcessCoord(pipe=7, data=7, model=3): 255} [2021-09-24 02:11:50,155] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=7 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe stage=1 layers=4 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe stage=2 layers=4 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=3 layers=4 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe stage=4 layers=4 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe stage=5 layers=4 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe stage=6 layers=4 27: ParallelTransformerLayerPipe 28: ParallelTransformerLayerPipe 29: ParallelTransformerLayerPipe 30: ParallelTransformerLayerPipe stage=7 layers=8 31: ParallelTransformerLayerPipe 32: ParallelTransformerLayerPipe 33: ParallelTransformerLayerPipe 34: ParallelTransformerLayerPipe 35: 36: MixedFusedLayerNorm 37: EmbeddingPipe 38: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 1986465792 [2021-09-24 02:11:51,439] [INFO] [utils.py:680:see_memory_usage] After Building Model [2021-09-24 02:11:51,440] [INFO] [utils.py:681:see_memory_usage] MA 3.77 GB Max_MA 3.79 GB CA 3.79 GB Max_CA 4 GB [2021-09-24 02:11:51,441] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 37.96 GB, percent = 20.3% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 1986465792 setting training iterations to 159576 > learning rate decay style: cosine DeepSpeed is enabled. [2021-09-24 02:11:51,495] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+bc17042, git-hash=bc17042, git-branch=big-science [2021-09-24 02:11:51,606] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-09-24 02:11:51,606] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-09-24 02:11:51,606] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer [2021-09-24 02:11:51,606] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-09-24 02:11:51,607] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-09-24 02:11:51,607] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-09-24 02:11:51,607] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-09-24 02:11:51,607] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-09-24 02:11:51,607] [INFO] [stage2.py:108:__init__] CPU Offload: False [2021-09-24 02:11:51,607] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False [2021-09-24 02:11:56,299] [INFO] [stage2.py:419:__init__] optimizer state initialized [2021-09-24 02:11:56,299] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-09-24 02:11:56,299] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-09-24 02:11:56,299] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-09-24 02:11:56,300] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-09-24 02:11:56,300] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] amp_enabled .................. False [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] amp_params ................... False [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] disable_allgather ............ False [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] dump_state ................... False [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-09-24 02:11:56,300] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] global_rank .................. 0 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] gradient_accumulation_steps .. 256 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] gradient_clipping ............ 1.0 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4096 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] optimizer_name ............... None [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] optimizer_params ............. None [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] pld_enabled .................. False [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] pld_params ................... False [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] scheduler_name ............... None [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] scheduler_params ............. None [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] sparse_attention ............. None [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] steps_per_print .............. 2000 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] train_batch_size ............. 2048 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 1 [2021-09-24 02:11:56,301] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-09-24 02:11:56,302] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-09-24 02:11:56,302] [INFO] [config.py:904:print] world_size ................... 8 [2021-09-24 02:11:56,302] [INFO] [config.py:904:print] zero_allow_untested_optimizer False [2021-09-24 02:11:56,302] [INFO] [config.py:904:print] zero_config .................. { "stage": 1, "contiguous_gradients": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-09-24 02:11:56,302] [INFO] [config.py:904:print] zero_enabled ................. True [2021-09-24 02:11:56,302] [INFO] [config.py:904:print] zero_optimization_stage ...... 1 [2021-09-24 02:11:56,302] [INFO] [config.py:906:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 2.048000e+03, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-09-24 02:11:56,302] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=256 micro_batch_size=1 [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=131 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=128 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=129 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=194 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=193 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=192 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=195 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=64 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=66 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=65 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=67 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=32 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=33 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=35 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=130 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=97 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=96 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=99 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=98 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=224 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=225 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=226 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=227 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=160 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=163 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=161 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=162 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,606] [INFO] [engine.py:134:__init__] RANK=34 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. WARNING: could not find the metadata file /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints will not load any checkpoints and will start from random [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,752] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-09-24 02:11:56,753] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. time (ms) | load-checkpoint: 1.91 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-24 02:11:56 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 300000000 validation: 1638400 test: 10240 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.214922 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.337 seconds total number of samples: 394611670 total number of epochs: 3 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.309 seconds total number of samples: 6927161 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.060 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-09-24 02:12:03 done with setup ... training ... time (ms) | model-and-optimizer-setup: 8062.72 | train/valid/test-data-iterators-setup: 5729.09 [before the start of training step] datetime: 2021-09-24 02:12:03 [2021-09-24 02:12:03,365] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information [2021-09-24 02:12:03,365] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-09-24 02:12:03,365] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 32 total layers [2021-09-24 02:12:03,365] [INFO] [checkpointing.py:415:forward] ----Synchronization False [2021-09-24 02:12:03,365] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False [Rank 1] (after 1 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 21150.0 | max reserved: 21150.0 [Rank 33] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 65] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 97] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 225] (after 1 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 22492.0 | max reserved: 22492.0 [Rank 129] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 193] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0 [Rank 161] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 2] (after 1 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 21150.0 | max reserved: 21150.0 [Rank 34] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 226] (after 1 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 21700.0 | max reserved: 21700.0 [Rank 66] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18778.0 | max reserved: 18778.0 [Rank 98] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0 [Rank 130] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 194] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18650.0 | max reserved: 18650.0 [Rank 162] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 0] (after 1 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 21470.0 | max reserved: 21470.0 [Rank 64] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 19252.0 | max reserved: 19252.0 [Rank 32] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18868.0 | max reserved: 18868.0 [Rank 128] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18868.0 | max reserved: 18868.0 [Rank 96] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18868.0 | max reserved: 18868.0 [Rank 224] (after 1 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 22492.0 | max reserved: 22492.0 [Rank 192] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18868.0 | max reserved: 18868.0 [Rank 160] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18868.0 | max reserved: 18868.0 [Rank 35] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 3] (after 1 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 21150.0 | max reserved: 21150.0 [Rank 67] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18522.0 | max reserved: 18522.0 [Rank 99] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 131] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18522.0 | max reserved: 18522.0 [Rank 227] (after 1 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 21700.0 | max reserved: 21700.0 [Rank 195] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0 [Rank 163] (after 1 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 iteration 1/ 159576 | consumed samples: 16 | elapsed time per iteration (ms): 31536.2 | learning rate: 4.438E-09 | global batch size: 16 | lm loss: 1.426722E+01 | loss scale: 4096.0 | grad norm: 1863985.704 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2/ 159576 | consumed samples: 32 | elapsed time per iteration (ms): 13049.6 | learning rate: 8.876E-09 | global batch size: 16 | lm loss: 1.429125E+01 | loss scale: 4096.0 | grad norm: 1882741.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3/ 159576 | consumed samples: 48 | elapsed time per iteration (ms): 13671.4 | learning rate: 1.331E-08 | global batch size: 16 | lm loss: 1.421026E+01 | loss scale: 4096.0 | grad norm: 1871916.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4/ 159576 | consumed samples: 64 | elapsed time per iteration (ms): 13544.5 | learning rate: 1.775E-08 | global batch size: 16 | lm loss: 1.424627E+01 | loss scale: 4096.0 | grad norm: 1912485.128 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5/ 159576 | consumed samples: 80 | elapsed time per iteration (ms): 13955.0 | learning rate: 2.219E-08 | global batch size: 16 | lm loss: 1.421161E+01 | loss scale: 4096.0 | grad norm: 1873991.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6/ 159576 | consumed samples: 96 | elapsed time per iteration (ms): 13725.9 | learning rate: 2.663E-08 | global batch size: 16 | lm loss: 1.423833E+01 | loss scale: 4096.0 | grad norm: 1889068.937 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7/ 159576 | consumed samples: 112 | elapsed time per iteration (ms): 13496.8 | learning rate: 3.107E-08 | global batch size: 16 | lm loss: 1.423929E+01 | loss scale: 4096.0 | grad norm: 1864001.655 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8/ 159576 | consumed samples: 128 | elapsed time per iteration (ms): 13565.8 | learning rate: 3.550E-08 | global batch size: 16 | lm loss: 1.424760E+01 | loss scale: 4096.0 | grad norm: 1867381.949 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9/ 159576 | consumed samples: 144 | elapsed time per iteration (ms): 14076.3 | learning rate: 3.994E-08 | global batch size: 16 | lm loss: 1.418199E+01 | loss scale: 4096.0 | grad norm: 1902029.931 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10/ 159576 | consumed samples: 160 | elapsed time per iteration (ms): 13497.5 | learning rate: 4.438E-08 | global batch size: 16 | lm loss: 1.412427E+01 | loss scale: 4096.0 | grad norm: 1865649.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11/ 159576 | consumed samples: 176 | elapsed time per iteration (ms): 13459.5 | learning rate: 4.882E-08 | global batch size: 16 | lm loss: 1.407386E+01 | loss scale: 4096.0 | grad norm: 1861067.628 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12/ 159576 | consumed samples: 192 | elapsed time per iteration (ms): 13581.0 | learning rate: 5.325E-08 | global batch size: 16 | lm loss: 1.400436E+01 | loss scale: 4096.0 | grad norm: 1857208.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 13/ 159576 | consumed samples: 208 | elapsed time per iteration (ms): 13877.0 | learning rate: 5.769E-08 | global batch size: 16 | lm loss: 1.374212E+01 | loss scale: 4096.0 | grad norm: 1860712.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 14/ 159576 | consumed samples: 224 | elapsed time per iteration (ms): 13730.6 | learning rate: 6.213E-08 | global batch size: 16 | lm loss: 1.363158E+01 | loss scale: 4096.0 | grad norm: 1835837.890 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 15/ 159576 | consumed samples: 240 | elapsed time per iteration (ms): 13589.9 | learning rate: 6.657E-08 | global batch size: 16 | lm loss: 1.353429E+01 | loss scale: 4096.0 | grad norm: 1866742.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 16/ 159576 | consumed samples: 256 | elapsed time per iteration (ms): 13709.9 | learning rate: 7.101E-08 | global batch size: 16 | lm loss: 1.346230E+01 | loss scale: 4096.0 | grad norm: 1867848.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 17/ 159576 | consumed samples: 272 | elapsed time per iteration (ms): 13515.8 | learning rate: 7.544E-08 | global batch size: 16 | lm loss: 1.257517E+01 | loss scale: 4096.0 | grad norm: 1827444.965 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 18/ 159576 | consumed samples: 288 | elapsed time per iteration (ms): 13800.0 | learning rate: 7.988E-08 | global batch size: 16 | lm loss: 1.251998E+01 | loss scale: 4096.0 | grad norm: 2020558.797 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 19/ 159576 | consumed samples: 304 | elapsed time per iteration (ms): 13516.3 | learning rate: 8.432E-08 | global batch size: 16 | lm loss: 1.265157E+01 | loss scale: 4096.0 | grad norm: 2257407.748 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 20/ 159576 | consumed samples: 320 | elapsed time per iteration (ms): 13549.6 | learning rate: 8.876E-08 | global batch size: 16 | lm loss: 1.252521E+01 | loss scale: 4096.0 | grad norm: 2095375.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 21/ 159576 | consumed samples: 336 | elapsed time per iteration (ms): 13586.7 | learning rate: 9.320E-08 | global batch size: 16 | lm loss: 1.244903E+01 | loss scale: 4096.0 | grad norm: 2211855.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 22/ 159576 | consumed samples: 352 | elapsed time per iteration (ms): 14140.0 | learning rate: 9.763E-08 | global batch size: 16 | lm loss: 1.221426E+01 | loss scale: 4096.0 | grad norm: 2152853.946 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 23/ 159576 | consumed samples: 368 | elapsed time per iteration (ms): 13565.7 | learning rate: 1.021E-07 | global batch size: 16 | lm loss: 1.223387E+01 | loss scale: 4096.0 | grad norm: 2257726.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 24/ 159576 | consumed samples: 384 | elapsed time per iteration (ms): 13529.2 | learning rate: 1.065E-07 | global batch size: 16 | lm loss: 1.252795E+01 | loss scale: 4096.0 | grad norm: 2648402.060 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 25/ 159576 | consumed samples: 400 | elapsed time per iteration (ms): 13468.4 | learning rate: 1.109E-07 | global batch size: 16 | lm loss: 1.249682E+01 | loss scale: 4096.0 | grad norm: 2816711.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 26/ 159576 | consumed samples: 416 | elapsed time per iteration (ms): 13529.9 | learning rate: 1.154E-07 | global batch size: 16 | lm loss: 1.219784E+01 | loss scale: 4096.0 | grad norm: 2380750.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 27/ 159576 | consumed samples: 432 | elapsed time per iteration (ms): 13833.4 | learning rate: 1.198E-07 | global batch size: 16 | lm loss: 1.182601E+01 | loss scale: 4096.0 | grad norm: 2116005.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 28/ 159576 | consumed samples: 448 | elapsed time per iteration (ms): 13615.6 | learning rate: 1.243E-07 | global batch size: 16 | lm loss: 1.159655E+01 | loss scale: 4096.0 | grad norm: 1805209.516 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 29/ 159576 | consumed samples: 464 | elapsed time per iteration (ms): 13371.2 | learning rate: 1.287E-07 | global batch size: 16 | lm loss: 1.165552E+01 | loss scale: 4096.0 | grad norm: 1731569.615 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 30/ 159576 | consumed samples: 480 | elapsed time per iteration (ms): 13604.8 | learning rate: 1.331E-07 | global batch size: 16 | lm loss: 1.154380E+01 | loss scale: 4096.0 | grad norm: 1706578.844 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 31/ 159576 | consumed samples: 496 | elapsed time per iteration (ms): 13982.3 | learning rate: 1.376E-07 | global batch size: 16 | lm loss: 1.139362E+01 | loss scale: 4096.0 | grad norm: 1757980.169 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 32/ 159576 | consumed samples: 512 | elapsed time per iteration (ms): 13306.0 | learning rate: 1.420E-07 | global batch size: 16 | lm loss: 1.148209E+01 | loss scale: 4096.0 | grad norm: 1697993.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 33/ 159576 | consumed samples: 528 | elapsed time per iteration (ms): 13575.8 | learning rate: 1.464E-07 | global batch size: 16 | lm loss: 1.140995E+01 | loss scale: 4096.0 | grad norm: 1670562.081 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 34/ 159576 | consumed samples: 544 | elapsed time per iteration (ms): 13613.2 | learning rate: 1.509E-07 | global batch size: 16 | lm loss: 1.132776E+01 | loss scale: 4096.0 | grad norm: 1643305.715 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 35/ 159576 | consumed samples: 560 | elapsed time per iteration (ms): 13869.9 | learning rate: 1.553E-07 | global batch size: 16 | lm loss: 1.136237E+01 | loss scale: 4096.0 | grad norm: 1648846.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 36/ 159576 | consumed samples: 576 | elapsed time per iteration (ms): 13789.0 | learning rate: 1.598E-07 | global batch size: 16 | lm loss: 1.143323E+01 | loss scale: 4096.0 | grad norm: 1598861.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 37/ 159576 | consumed samples: 592 | elapsed time per iteration (ms): 13658.0 | learning rate: 1.642E-07 | global batch size: 16 | lm loss: 1.115875E+01 | loss scale: 4096.0 | grad norm: 1562919.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 38/ 159576 | consumed samples: 608 | elapsed time per iteration (ms): 13961.2 | learning rate: 1.686E-07 | global batch size: 16 | lm loss: 1.117768E+01 | loss scale: 4096.0 | grad norm: 1565543.705 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 39/ 159576 | consumed samples: 624 | elapsed time per iteration (ms): 13410.4 | learning rate: 1.731E-07 | global batch size: 16 | lm loss: 1.111340E+01 | loss scale: 4096.0 | grad norm: 1536768.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 40/ 159576 | consumed samples: 640 | elapsed time per iteration (ms): 13891.8 | learning rate: 1.775E-07 | global batch size: 16 | lm loss: 1.106657E+01 | loss scale: 4096.0 | grad norm: 1548421.837 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 41/ 159576 | consumed samples: 656 | elapsed time per iteration (ms): 13633.3 | learning rate: 1.820E-07 | global batch size: 16 | lm loss: 1.094995E+01 | loss scale: 4096.0 | grad norm: 1532446.839 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 42/ 159576 | consumed samples: 672 | elapsed time per iteration (ms): 13643.8 | learning rate: 1.864E-07 | global batch size: 16 | lm loss: 1.087856E+01 | loss scale: 4096.0 | grad norm: 1531337.842 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 43/ 159576 | consumed samples: 688 | elapsed time per iteration (ms): 13630.7 | learning rate: 1.908E-07 | global batch size: 16 | lm loss: 1.084412E+01 | loss scale: 4096.0 | grad norm: 1473539.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 44/ 159576 | consumed samples: 704 | elapsed time per iteration (ms): 14118.0 | learning rate: 1.953E-07 | global batch size: 16 | lm loss: 1.114596E+01 | loss scale: 4096.0 | grad norm: 1496700.678 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 45/ 159576 | consumed samples: 720 | elapsed time per iteration (ms): 13853.8 | learning rate: 1.997E-07 | global batch size: 16 | lm loss: 1.092829E+01 | loss scale: 4096.0 | grad norm: 1454980.052 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 46/ 159576 | consumed samples: 736 | elapsed time per iteration (ms): 13549.0 | learning rate: 2.041E-07 | global batch size: 16 | lm loss: 1.074461E+01 | loss scale: 4096.0 | grad norm: 1397083.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 47/ 159576 | consumed samples: 752 | elapsed time per iteration (ms): 13627.3 | learning rate: 2.086E-07 | global batch size: 16 | lm loss: 1.066580E+01 | loss scale: 4096.0 | grad norm: 1311670.870 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 48/ 159576 | consumed samples: 768 | elapsed time per iteration (ms): 13674.9 | learning rate: 2.130E-07 | global batch size: 16 | lm loss: 1.055744E+01 | loss scale: 4096.0 | grad norm: 1292299.744 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 49/ 159576 | consumed samples: 784 | elapsed time per iteration (ms): 13932.1 | learning rate: 2.175E-07 | global batch size: 16 | lm loss: 1.060610E+01 | loss scale: 4096.0 | grad norm: 1283482.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 50/ 159576 | consumed samples: 800 | elapsed time per iteration (ms): 13665.9 | learning rate: 2.219E-07 | global batch size: 16 | lm loss: 1.063007E+01 | loss scale: 4096.0 | grad norm: 1228203.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 51/ 159576 | consumed samples: 816 | elapsed time per iteration (ms): 13667.5 | learning rate: 2.263E-07 | global batch size: 16 | lm loss: 1.046357E+01 | loss scale: 4096.0 | grad norm: 1219490.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 52/ 159576 | consumed samples: 832 | elapsed time per iteration (ms): 13793.6 | learning rate: 2.308E-07 | global batch size: 16 | lm loss: 1.061804E+01 | loss scale: 4096.0 | grad norm: 1197068.783 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 53/ 159576 | consumed samples: 848 | elapsed time per iteration (ms): 14209.6 | learning rate: 2.352E-07 | global batch size: 16 | lm loss: 1.041930E+01 | loss scale: 4096.0 | grad norm: 1168890.772 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 54/ 159576 | consumed samples: 864 | elapsed time per iteration (ms): 13453.2 | learning rate: 2.396E-07 | global batch size: 16 | lm loss: 1.035855E+01 | loss scale: 4096.0 | grad norm: 1126594.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 55/ 159576 | consumed samples: 880 | elapsed time per iteration (ms): 13666.6 | learning rate: 2.441E-07 | global batch size: 16 | lm loss: 1.051081E+01 | loss scale: 4096.0 | grad norm: 1080949.187 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 56/ 159576 | consumed samples: 896 | elapsed time per iteration (ms): 13689.5 | learning rate: 2.485E-07 | global batch size: 16 | lm loss: 1.048364E+01 | loss scale: 4096.0 | grad norm: 1069119.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 57/ 159576 | consumed samples: 912 | elapsed time per iteration (ms): 14289.6 | learning rate: 2.530E-07 | global batch size: 16 | lm loss: 1.048154E+01 | loss scale: 4096.0 | grad norm: 1016407.938 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 58/ 159576 | consumed samples: 928 | elapsed time per iteration (ms): 13663.2 | learning rate: 2.574E-07 | global batch size: 16 | lm loss: 1.019213E+01 | loss scale: 4096.0 | grad norm: 982402.590 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 59/ 159576 | consumed samples: 944 | elapsed time per iteration (ms): 13704.5 | learning rate: 2.618E-07 | global batch size: 16 | lm loss: 1.019982E+01 | loss scale: 4096.0 | grad norm: 965254.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 60/ 159576 | consumed samples: 960 | elapsed time per iteration (ms): 13846.3 | learning rate: 2.663E-07 | global batch size: 16 | lm loss: 1.021626E+01 | loss scale: 4096.0 | grad norm: 926021.764 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 61/ 159576 | consumed samples: 976 | elapsed time per iteration (ms): 13469.9 | learning rate: 2.707E-07 | global batch size: 16 | lm loss: 1.008368E+01 | loss scale: 4096.0 | grad norm: 911608.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 62/ 159576 | consumed samples: 992 | elapsed time per iteration (ms): 13774.9 | learning rate: 2.751E-07 | global batch size: 16 | lm loss: 9.892099E+00 | loss scale: 4096.0 | grad norm: 882114.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 63/ 159576 | consumed samples: 1008 | elapsed time per iteration (ms): 13514.1 | learning rate: 2.796E-07 | global batch size: 16 | lm loss: 9.876393E+00 | loss scale: 4096.0 | grad norm: 834416.962 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 64/ 159576 | consumed samples: 1024 | elapsed time per iteration (ms): 13538.5 | learning rate: 2.840E-07 | global batch size: 16 | lm loss: 9.927294E+00 | loss scale: 4096.0 | grad norm: 814691.882 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 65/ 159576 | consumed samples: 1040 | elapsed time per iteration (ms): 13496.5 | learning rate: 2.885E-07 | global batch size: 16 | lm loss: 1.024293E+01 | loss scale: 4096.0 | grad norm: 821175.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 66/ 159576 | consumed samples: 1056 | elapsed time per iteration (ms): 14030.7 | learning rate: 2.929E-07 | global batch size: 16 | lm loss: 9.930872E+00 | loss scale: 4096.0 | grad norm: 759629.854 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 67/ 159576 | consumed samples: 1072 | elapsed time per iteration (ms): 13743.1 | learning rate: 2.973E-07 | global batch size: 16 | lm loss: 9.852800E+00 | loss scale: 4096.0 | grad norm: 734440.980 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 68/ 159576 | consumed samples: 1088 | elapsed time per iteration (ms): 13293.2 | learning rate: 3.018E-07 | global batch size: 16 | lm loss: 9.786448E+00 | loss scale: 4096.0 | grad norm: 702591.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 69/ 159576 | consumed samples: 1104 | elapsed time per iteration (ms): 13515.6 | learning rate: 3.062E-07 | global batch size: 16 | lm loss: 9.917148E+00 | loss scale: 4096.0 | grad norm: 689937.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 70/ 159576 | consumed samples: 1120 | elapsed time per iteration (ms): 13786.0 | learning rate: 3.107E-07 | global batch size: 16 | lm loss: 9.593161E+00 | loss scale: 4096.0 | grad norm: 634541.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 71/ 159576 | consumed samples: 1136 | elapsed time per iteration (ms): 13761.6 | learning rate: 3.151E-07 | global batch size: 16 | lm loss: 9.685747E+00 | loss scale: 4096.0 | grad norm: 620089.160 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 72/ 159576 | consumed samples: 1152 | elapsed time per iteration (ms): 13503.1 | learning rate: 3.195E-07 | global batch size: 16 | lm loss: 9.550736E+00 | loss scale: 4096.0 | grad norm: 592735.898 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 73/ 159576 | consumed samples: 1168 | elapsed time per iteration (ms): 13574.6 | learning rate: 3.240E-07 | global batch size: 16 | lm loss: 9.780053E+00 | loss scale: 4096.0 | grad norm: 578902.468 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 74/ 159576 | consumed samples: 1184 | elapsed time per iteration (ms): 13563.6 | learning rate: 3.284E-07 | global batch size: 16 | lm loss: 9.660094E+00 | loss scale: 4096.0 | grad norm: 549632.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 75/ 159576 | consumed samples: 1200 | elapsed time per iteration (ms): 13751.3 | learning rate: 3.328E-07 | global batch size: 16 | lm loss: 9.715110E+00 | loss scale: 4096.0 | grad norm: 523457.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 76/ 159576 | consumed samples: 1216 | elapsed time per iteration (ms): 13613.9 | learning rate: 3.373E-07 | global batch size: 16 | lm loss: 9.548697E+00 | loss scale: 4096.0 | grad norm: 559789.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 77/ 159576 | consumed samples: 1232 | elapsed time per iteration (ms): 13668.9 | learning rate: 3.417E-07 | global batch size: 16 | lm loss: 9.395579E+00 | loss scale: 4096.0 | grad norm: 516053.141 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 78/ 159576 | consumed samples: 1248 | elapsed time per iteration (ms): 13540.8 | learning rate: 3.462E-07 | global batch size: 16 | lm loss: 9.450207E+00 | loss scale: 4096.0 | grad norm: 491518.990 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 79/ 159576 | consumed samples: 1264 | elapsed time per iteration (ms): 13951.5 | learning rate: 3.506E-07 | global batch size: 16 | lm loss: 9.312221E+00 | loss scale: 4096.0 | grad norm: 445025.682 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 80/ 159576 | consumed samples: 1280 | elapsed time per iteration (ms): 13710.1 | learning rate: 3.550E-07 | global batch size: 16 | lm loss: 9.362122E+00 | loss scale: 4096.0 | grad norm: 498046.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 81/ 159576 | consumed samples: 1296 | elapsed time per iteration (ms): 13653.8 | learning rate: 3.595E-07 | global batch size: 16 | lm loss: 9.684261E+00 | loss scale: 4096.0 | grad norm: 460137.704 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 82/ 159576 | consumed samples: 1312 | elapsed time per iteration (ms): 13416.1 | learning rate: 3.639E-07 | global batch size: 16 | lm loss: 9.111031E+00 | loss scale: 4096.0 | grad norm: 462196.098 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 83/ 159576 | consumed samples: 1328 | elapsed time per iteration (ms): 13589.7 | learning rate: 3.683E-07 | global batch size: 16 | lm loss: 9.424231E+00 | loss scale: 4096.0 | grad norm: 387492.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 84/ 159576 | consumed samples: 1344 | elapsed time per iteration (ms): 13890.8 | learning rate: 3.728E-07 | global batch size: 16 | lm loss: 9.225885E+00 | loss scale: 4096.0 | grad norm: 477146.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 85/ 159576 | consumed samples: 1360 | elapsed time per iteration (ms): 13578.1 | learning rate: 3.772E-07 | global batch size: 16 | lm loss: 9.449253E+00 | loss scale: 4096.0 | grad norm: 498838.088 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 86/ 159576 | consumed samples: 1376 | elapsed time per iteration (ms): 13600.8 | learning rate: 3.817E-07 | global batch size: 16 | lm loss: 9.186915E+00 | loss scale: 4096.0 | grad norm: 359821.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 87/ 159576 | consumed samples: 1392 | elapsed time per iteration (ms): 13578.0 | learning rate: 3.861E-07 | global batch size: 16 | lm loss: 9.169426E+00 | loss scale: 4096.0 | grad norm: 336361.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 88/ 159576 | consumed samples: 1408 | elapsed time per iteration (ms): 14258.1 | learning rate: 3.905E-07 | global batch size: 16 | lm loss: 9.174639E+00 | loss scale: 4096.0 | grad norm: 513262.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 89/ 159576 | consumed samples: 1424 | elapsed time per iteration (ms): 13350.5 | learning rate: 3.950E-07 | global batch size: 16 | lm loss: 9.322023E+00 | loss scale: 4096.0 | grad norm: 417913.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 90/ 159576 | consumed samples: 1440 | elapsed time per iteration (ms): 13582.0 | learning rate: 3.994E-07 | global batch size: 16 | lm loss: 9.319530E+00 | loss scale: 4096.0 | grad norm: 326159.953 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 91/ 159576 | consumed samples: 1456 | elapsed time per iteration (ms): 13577.6 | learning rate: 4.038E-07 | global batch size: 16 | lm loss: 9.305362E+00 | loss scale: 4096.0 | grad norm: 312504.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 92/ 159576 | consumed samples: 1472 | elapsed time per iteration (ms): 13979.9 | learning rate: 4.083E-07 | global batch size: 16 | lm loss: 8.797226E+00 | loss scale: 4096.0 | grad norm: 299274.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 93/ 159576 | consumed samples: 1488 | elapsed time per iteration (ms): 13685.6 | learning rate: 4.127E-07 | global batch size: 16 | lm loss: 9.470177E+00 | loss scale: 4096.0 | grad norm: 889931.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 94/ 159576 | consumed samples: 1504 | elapsed time per iteration (ms): 13625.1 | learning rate: 4.172E-07 | global batch size: 16 | lm loss: 9.601658E+00 | loss scale: 4096.0 | grad norm: 858157.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 95/ 159576 | consumed samples: 1520 | elapsed time per iteration (ms): 13713.7 | learning rate: 4.216E-07 | global batch size: 16 | lm loss: 9.093191E+00 | loss scale: 4096.0 | grad norm: 308888.782 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 96/ 159576 | consumed samples: 1536 | elapsed time per iteration (ms): 13441.7 | learning rate: 4.260E-07 | global batch size: 16 | lm loss: 9.258781E+00 | loss scale: 4096.0 | grad norm: 285375.841 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 97/ 159576 | consumed samples: 1552 | elapsed time per iteration (ms): 13952.1 | learning rate: 4.305E-07 | global batch size: 16 | lm loss: 9.267257E+00 | loss scale: 4096.0 | grad norm: 266598.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 98/ 159576 | consumed samples: 1568 | elapsed time per iteration (ms): 13570.4 | learning rate: 4.349E-07 | global batch size: 16 | lm loss: 9.302748E+00 | loss scale: 4096.0 | grad norm: 430050.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 99/ 159576 | consumed samples: 1584 | elapsed time per iteration (ms): 13655.7 | learning rate: 4.393E-07 | global batch size: 16 | lm loss: 9.206352E+00 | loss scale: 4096.0 | grad norm: 522965.120 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 100/ 159576 | consumed samples: 1600 | elapsed time per iteration (ms): 13606.3 | learning rate: 4.438E-07 | global batch size: 16 | lm loss: 9.212991E+00 | loss scale: 4096.0 | grad norm: 351294.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 101/ 159576 | consumed samples: 1616 | elapsed time per iteration (ms): 14021.3 | learning rate: 4.482E-07 | global batch size: 16 | lm loss: 9.392309E+00 | loss scale: 4096.0 | grad norm: 249407.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 102/ 159576 | consumed samples: 1632 | elapsed time per iteration (ms): 13722.5 | learning rate: 4.527E-07 | global batch size: 16 | lm loss: 9.173745E+00 | loss scale: 4096.0 | grad norm: 230190.700 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 103/ 159576 | consumed samples: 1648 | elapsed time per iteration (ms): 13481.3 | learning rate: 4.571E-07 | global batch size: 16 | lm loss: 9.060183E+00 | loss scale: 4096.0 | grad norm: 535519.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 104/ 159576 | consumed samples: 1664 | elapsed time per iteration (ms): 13573.2 | learning rate: 4.615E-07 | global batch size: 16 | lm loss: 8.820353E+00 | loss scale: 4096.0 | grad norm: 252106.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 105/ 159576 | consumed samples: 1680 | elapsed time per iteration (ms): 13679.8 | learning rate: 4.660E-07 | global batch size: 16 | lm loss: 8.907228E+00 | loss scale: 4096.0 | grad norm: 227304.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 106/ 159576 | consumed samples: 1696 | elapsed time per iteration (ms): 13833.6 | learning rate: 4.704E-07 | global batch size: 16 | lm loss: 8.920894E+00 | loss scale: 4096.0 | grad norm: 226622.044 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 107/ 159576 | consumed samples: 1712 | elapsed time per iteration (ms): 13577.9 | learning rate: 4.749E-07 | global batch size: 16 | lm loss: 8.839094E+00 | loss scale: 4096.0 | grad norm: 188033.687 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 108/ 159576 | consumed samples: 1728 | elapsed time per iteration (ms): 13620.7 | learning rate: 4.793E-07 | global batch size: 16 | lm loss: 9.072345E+00 | loss scale: 4096.0 | grad norm: 405511.072 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 109/ 159576 | consumed samples: 1744 | elapsed time per iteration (ms): 13608.5 | learning rate: 4.837E-07 | global batch size: 16 | lm loss: 8.981932E+00 | loss scale: 4096.0 | grad norm: 326365.949 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 110/ 159576 | consumed samples: 1760 | elapsed time per iteration (ms): 13945.7 | learning rate: 4.882E-07 | global batch size: 16 | lm loss: 8.900158E+00 | loss scale: 4096.0 | grad norm: 183771.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 111/ 159576 | consumed samples: 1776 | elapsed time per iteration (ms): 13542.6 | learning rate: 4.926E-07 | global batch size: 16 | lm loss: 8.908926E+00 | loss scale: 4096.0 | grad norm: 189581.109 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 112/ 159576 | consumed samples: 1792 | elapsed time per iteration (ms): 13715.6 | learning rate: 4.970E-07 | global batch size: 16 | lm loss: 8.738115E+00 | loss scale: 4096.0 | grad norm: 176974.824 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 113/ 159576 | consumed samples: 1808 | elapsed time per iteration (ms): 13456.9 | learning rate: 5.015E-07 | global batch size: 16 | lm loss: 9.185429E+00 | loss scale: 4096.0 | grad norm: 452577.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 114/ 159576 | consumed samples: 1824 | elapsed time per iteration (ms): 14039.5 | learning rate: 5.059E-07 | global batch size: 16 | lm loss: 9.235853E+00 | loss scale: 4096.0 | grad norm: 567475.961 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 115/ 159576 | consumed samples: 1840 | elapsed time per iteration (ms): 13568.6 | learning rate: 5.104E-07 | global batch size: 16 | lm loss: 8.848898E+00 | loss scale: 4096.0 | grad norm: 182062.035 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 116/ 159576 | consumed samples: 1856 | elapsed time per iteration (ms): 13607.1 | learning rate: 5.148E-07 | global batch size: 16 | lm loss: 8.955499E+00 | loss scale: 4096.0 | grad norm: 179172.056 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 117/ 159576 | consumed samples: 1872 | elapsed time per iteration (ms): 13798.7 | learning rate: 5.192E-07 | global batch size: 16 | lm loss: 8.835221E+00 | loss scale: 4096.0 | grad norm: 168846.925 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 118/ 159576 | consumed samples: 1888 | elapsed time per iteration (ms): 13424.3 | learning rate: 5.237E-07 | global batch size: 16 | lm loss: 9.120043E+00 | loss scale: 4096.0 | grad norm: 304218.818 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 119/ 159576 | consumed samples: 1904 | elapsed time per iteration (ms): 13992.7 | learning rate: 5.281E-07 | global batch size: 16 | lm loss: 8.877877E+00 | loss scale: 4096.0 | grad norm: 328004.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 120/ 159576 | consumed samples: 1920 | elapsed time per iteration (ms): 13739.9 | learning rate: 5.325E-07 | global batch size: 16 | lm loss: 9.091492E+00 | loss scale: 4096.0 | grad norm: 542667.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 121/ 159576 | consumed samples: 1936 | elapsed time per iteration (ms): 13438.9 | learning rate: 5.370E-07 | global batch size: 16 | lm loss: 8.963889E+00 | loss scale: 4096.0 | grad norm: 173633.066 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 122/ 159576 | consumed samples: 1952 | elapsed time per iteration (ms): 13659.9 | learning rate: 5.414E-07 | global batch size: 16 | lm loss: 8.973601E+00 | loss scale: 4096.0 | grad norm: 154883.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 123/ 159576 | consumed samples: 1968 | elapsed time per iteration (ms): 14034.9 | learning rate: 5.459E-07 | global batch size: 16 | lm loss: 8.932154E+00 | loss scale: 4096.0 | grad norm: 191305.172 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 124/ 159576 | consumed samples: 1984 | elapsed time per iteration (ms): 13642.6 | learning rate: 5.503E-07 | global batch size: 16 | lm loss: 8.718765E+00 | loss scale: 4096.0 | grad norm: 141927.967 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 125/ 159576 | consumed samples: 2000 | elapsed time per iteration (ms): 13607.3 | learning rate: 5.547E-07 | global batch size: 16 | lm loss: 9.022717E+00 | loss scale: 4096.0 | grad norm: 530230.902 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 126/ 159576 | consumed samples: 2016 | elapsed time per iteration (ms): 13623.2 | learning rate: 5.592E-07 | global batch size: 16 | lm loss: 9.160154E+00 | loss scale: 4096.0 | grad norm: 525377.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 127/ 159576 | consumed samples: 2032 | elapsed time per iteration (ms): 13944.5 | learning rate: 5.636E-07 | global batch size: 16 | lm loss: 8.602621E+00 | loss scale: 4096.0 | grad norm: 180832.122 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 128/ 159576 | consumed samples: 2048 | elapsed time per iteration (ms): 13652.1 | learning rate: 5.680E-07 | global batch size: 16 | lm loss: 8.848473E+00 | loss scale: 4096.0 | grad norm: 159006.909 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 129/ 159576 | consumed samples: 2064 | elapsed time per iteration (ms): 13619.4 | learning rate: 5.725E-07 | global batch size: 16 | lm loss: 8.697285E+00 | loss scale: 4096.0 | grad norm: 166208.955 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 130/ 159576 | consumed samples: 2080 | elapsed time per iteration (ms): 13649.8 | learning rate: 5.769E-07 | global batch size: 16 | lm loss: 8.738346E+00 | loss scale: 4096.0 | grad norm: 142582.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 131/ 159576 | consumed samples: 2096 | elapsed time per iteration (ms): 13648.8 | learning rate: 5.814E-07 | global batch size: 16 | lm loss: 8.628532E+00 | loss scale: 4096.0 | grad norm: 119745.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 132/ 159576 | consumed samples: 2112 | elapsed time per iteration (ms): 13855.7 | learning rate: 5.858E-07 | global batch size: 16 | lm loss: 8.681314E+00 | loss scale: 4096.0 | grad norm: 238581.530 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 133/ 159576 | consumed samples: 2128 | elapsed time per iteration (ms): 13614.3 | learning rate: 5.902E-07 | global batch size: 16 | lm loss: 8.853155E+00 | loss scale: 4096.0 | grad norm: 190597.797 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 134/ 159576 | consumed samples: 2144 | elapsed time per iteration (ms): 13742.8 | learning rate: 5.947E-07 | global batch size: 16 | lm loss: 8.840850E+00 | loss scale: 4096.0 | grad norm: 157001.058 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 135/ 159576 | consumed samples: 2160 | elapsed time per iteration (ms): 13481.4 | learning rate: 5.991E-07 | global batch size: 16 | lm loss: 8.721090E+00 | loss scale: 4096.0 | grad norm: 120761.062 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 136/ 159576 | consumed samples: 2176 | elapsed time per iteration (ms): 14037.0 | learning rate: 6.036E-07 | global batch size: 16 | lm loss: 8.786610E+00 | loss scale: 4096.0 | grad norm: 109166.988 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 137/ 159576 | consumed samples: 2192 | elapsed time per iteration (ms): 13631.2 | learning rate: 6.080E-07 | global batch size: 16 | lm loss: 8.825349E+00 | loss scale: 4096.0 | grad norm: 393039.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 138/ 159576 | consumed samples: 2208 | elapsed time per iteration (ms): 13698.2 | learning rate: 6.124E-07 | global batch size: 16 | lm loss: 8.681873E+00 | loss scale: 4096.0 | grad norm: 210924.024 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 139/ 159576 | consumed samples: 2224 | elapsed time per iteration (ms): 13641.8 | learning rate: 6.169E-07 | global batch size: 16 | lm loss: 8.758416E+00 | loss scale: 4096.0 | grad norm: 111138.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 140/ 159576 | consumed samples: 2240 | elapsed time per iteration (ms): 13650.3 | learning rate: 6.213E-07 | global batch size: 16 | lm loss: 8.646829E+00 | loss scale: 4096.0 | grad norm: 115663.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 141/ 159576 | consumed samples: 2256 | elapsed time per iteration (ms): 14097.3 | learning rate: 6.257E-07 | global batch size: 16 | lm loss: 8.653087E+00 | loss scale: 4096.0 | grad norm: 142126.653 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 142/ 159576 | consumed samples: 2272 | elapsed time per iteration (ms): 13468.2 | learning rate: 6.302E-07 | global batch size: 16 | lm loss: 8.647311E+00 | loss scale: 4096.0 | grad norm: 163914.852 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 143/ 159576 | consumed samples: 2288 | elapsed time per iteration (ms): 13544.7 | learning rate: 6.346E-07 | global batch size: 16 | lm loss: 8.564240E+00 | loss scale: 4096.0 | grad norm: 159952.939 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 144/ 159576 | consumed samples: 2304 | elapsed time per iteration (ms): 13642.1 | learning rate: 6.391E-07 | global batch size: 16 | lm loss: 8.789017E+00 | loss scale: 4096.0 | grad norm: 169255.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 145/ 159576 | consumed samples: 2320 | elapsed time per iteration (ms): 14181.4 | learning rate: 6.435E-07 | global batch size: 16 | lm loss: 8.811962E+00 | loss scale: 4096.0 | grad norm: 127162.884 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 146/ 159576 | consumed samples: 2336 | elapsed time per iteration (ms): 13492.3 | learning rate: 6.479E-07 | global batch size: 16 | lm loss: 8.774818E+00 | loss scale: 4096.0 | grad norm: 110483.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 147/ 159576 | consumed samples: 2352 | elapsed time per iteration (ms): 13671.3 | learning rate: 6.524E-07 | global batch size: 16 | lm loss: 8.753700E+00 | loss scale: 4096.0 | grad norm: 128181.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 148/ 159576 | consumed samples: 2368 | elapsed time per iteration (ms): 13675.0 | learning rate: 6.568E-07 | global batch size: 16 | lm loss: 8.742964E+00 | loss scale: 4096.0 | grad norm: 140698.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 149/ 159576 | consumed samples: 2384 | elapsed time per iteration (ms): 14154.8 | learning rate: 6.612E-07 | global batch size: 16 | lm loss: 8.705631E+00 | loss scale: 4096.0 | grad norm: 284561.708 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 150/ 159576 | consumed samples: 2400 | elapsed time per iteration (ms): 13301.3 | learning rate: 6.657E-07 | global batch size: 16 | lm loss: 8.639321E+00 | loss scale: 4096.0 | grad norm: 158457.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 151/ 159576 | consumed samples: 2416 | elapsed time per iteration (ms): 13553.4 | learning rate: 6.701E-07 | global batch size: 16 | lm loss: 8.747204E+00 | loss scale: 4096.0 | grad norm: 217035.827 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 152/ 159576 | consumed samples: 2432 | elapsed time per iteration (ms): 13577.6 | learning rate: 6.746E-07 | global batch size: 16 | lm loss: 8.711011E+00 | loss scale: 4096.0 | grad norm: 170149.010 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 153/ 159576 | consumed samples: 2448 | elapsed time per iteration (ms): 13522.0 | learning rate: 6.790E-07 | global batch size: 16 | lm loss: 8.717499E+00 | loss scale: 4096.0 | grad norm: 103133.580 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 154/ 159576 | consumed samples: 2464 | elapsed time per iteration (ms): 13883.8 | learning rate: 6.834E-07 | global batch size: 16 | lm loss: 8.587013E+00 | loss scale: 4096.0 | grad norm: 99765.078 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 155/ 159576 | consumed samples: 2480 | elapsed time per iteration (ms): 13554.0 | learning rate: 6.879E-07 | global batch size: 16 | lm loss: 8.698885E+00 | loss scale: 4096.0 | grad norm: 282680.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 156/ 159576 | consumed samples: 2496 | elapsed time per iteration (ms): 13692.4 | learning rate: 6.923E-07 | global batch size: 16 | lm loss: 9.289864E+00 | loss scale: 4096.0 | grad norm: 609278.865 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 157/ 159576 | consumed samples: 2512 | elapsed time per iteration (ms): 13306.0 | learning rate: 6.967E-07 | global batch size: 16 | lm loss: 8.803203E+00 | loss scale: 4096.0 | grad norm: 221182.708 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 02:48:11] PULSE: tr8-104B is waiting to be scheduled (1159457_[1-10%1] on 'gpu_p13' partition) [2021-09-24 02:48:11] PULSE: tr8-104B is scheduled to start in 18:26:36 (at 2021-09-24T21:14:48) (1161605 on 'gpu_p13' partition) [2021-09-24 02:48:11] PULSE: tr8-104B is running for 37:09 since 2021-09-24T02:11:02 (1161730 on 'gpu_p13' partition (r6i4n7,r6i5n[7-8],r6i6n[0,6,8],r6i7n3,r7i2n[2,4-5],r7i3n2,r7i6n[2-4],r7i7n[3,7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i3n[0-2],r8i5n[3-4],r8i7n[3-6,8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 158/ 159576 | consumed samples: 2528 | elapsed time per iteration (ms): 13873.2 | learning rate: 7.012E-07 | global batch size: 16 | lm loss: 8.628306E+00 | loss scale: 4096.0 | grad norm: 200507.061 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 159/ 159576 | consumed samples: 2544 | elapsed time per iteration (ms): 13466.2 | learning rate: 7.056E-07 | global batch size: 16 | lm loss: 8.632781E+00 | loss scale: 4096.0 | grad norm: 103638.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 160/ 159576 | consumed samples: 2560 | elapsed time per iteration (ms): 13494.3 | learning rate: 7.101E-07 | global batch size: 16 | lm loss: 8.596104E+00 | loss scale: 4096.0 | grad norm: 92105.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 161/ 159576 | consumed samples: 2576 | elapsed time per iteration (ms): 13517.5 | learning rate: 7.145E-07 | global batch size: 16 | lm loss: 8.408714E+00 | loss scale: 4096.0 | grad norm: 78965.627 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 162/ 159576 | consumed samples: 2592 | elapsed time per iteration (ms): 13540.1 | learning rate: 7.189E-07 | global batch size: 16 | lm loss: 9.134837E+00 | loss scale: 4096.0 | grad norm: 524949.559 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 163/ 159576 | consumed samples: 2608 | elapsed time per iteration (ms): 13879.1 | learning rate: 7.234E-07 | global batch size: 16 | lm loss: 8.601346E+00 | loss scale: 4096.0 | grad norm: 206465.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 164/ 159576 | consumed samples: 2624 | elapsed time per iteration (ms): 13564.5 | learning rate: 7.278E-07 | global batch size: 16 | lm loss: 8.734079E+00 | loss scale: 4096.0 | grad norm: 159985.137 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 165/ 159576 | consumed samples: 2640 | elapsed time per iteration (ms): 13607.4 | learning rate: 7.322E-07 | global batch size: 16 | lm loss: 8.629238E+00 | loss scale: 4096.0 | grad norm: 89678.564 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 166/ 159576 | consumed samples: 2656 | elapsed time per iteration (ms): 13687.7 | learning rate: 7.367E-07 | global batch size: 16 | lm loss: 8.753635E+00 | loss scale: 4096.0 | grad norm: 108761.613 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 167/ 159576 | consumed samples: 2672 | elapsed time per iteration (ms): 14101.4 | learning rate: 7.411E-07 | global batch size: 16 | lm loss: 8.647141E+00 | loss scale: 4096.0 | grad norm: 78778.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 168/ 159576 | consumed samples: 2688 | elapsed time per iteration (ms): 13827.5 | learning rate: 7.456E-07 | global batch size: 16 | lm loss: 8.838135E+00 | loss scale: 4096.0 | grad norm: 301360.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 169/ 159576 | consumed samples: 2704 | elapsed time per iteration (ms): 13776.5 | learning rate: 7.500E-07 | global batch size: 16 | lm loss: 8.865972E+00 | loss scale: 4096.0 | grad norm: 230779.992 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 170/ 159576 | consumed samples: 2720 | elapsed time per iteration (ms): 13667.3 | learning rate: 7.544E-07 | global batch size: 16 | lm loss: 8.716210E+00 | loss scale: 4096.0 | grad norm: 133087.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 171/ 159576 | consumed samples: 2736 | elapsed time per iteration (ms): 13974.1 | learning rate: 7.589E-07 | global batch size: 16 | lm loss: 8.726005E+00 | loss scale: 4096.0 | grad norm: 112595.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 172/ 159576 | consumed samples: 2752 | elapsed time per iteration (ms): 13644.3 | learning rate: 7.633E-07 | global batch size: 16 | lm loss: 8.704071E+00 | loss scale: 4096.0 | grad norm: 92111.748 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 173/ 159576 | consumed samples: 2768 | elapsed time per iteration (ms): 13586.4 | learning rate: 7.678E-07 | global batch size: 16 | lm loss: 8.823001E+00 | loss scale: 4096.0 | grad norm: 93068.020 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 174/ 159576 | consumed samples: 2784 | elapsed time per iteration (ms): 13629.3 | learning rate: 7.722E-07 | global batch size: 16 | lm loss: 8.521597E+00 | loss scale: 4096.0 | grad norm: 79887.666 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 175/ 159576 | consumed samples: 2800 | elapsed time per iteration (ms): 13647.0 | learning rate: 7.766E-07 | global batch size: 16 | lm loss: 9.370278E+00 | loss scale: 4096.0 | grad norm: 576797.121 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 176/ 159576 | consumed samples: 2816 | elapsed time per iteration (ms): 13993.8 | learning rate: 7.811E-07 | global batch size: 16 | lm loss: 9.255205E+00 | loss scale: 4096.0 | grad norm: 337846.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 177/ 159576 | consumed samples: 2832 | elapsed time per iteration (ms): 13778.2 | learning rate: 7.855E-07 | global batch size: 16 | lm loss: 9.038449E+00 | loss scale: 4096.0 | grad norm: 339366.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 178/ 159576 | consumed samples: 2848 | elapsed time per iteration (ms): 13515.3 | learning rate: 7.899E-07 | global batch size: 16 | lm loss: 8.771539E+00 | loss scale: 4096.0 | grad norm: 216761.610 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 179/ 159576 | consumed samples: 2864 | elapsed time per iteration (ms): 13657.6 | learning rate: 7.944E-07 | global batch size: 16 | lm loss: 8.718536E+00 | loss scale: 4096.0 | grad norm: 103470.129 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 180/ 159576 | consumed samples: 2880 | elapsed time per iteration (ms): 14095.5 | learning rate: 7.988E-07 | global batch size: 16 | lm loss: 8.968449E+00 | loss scale: 4096.0 | grad norm: 88300.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 181/ 159576 | consumed samples: 2896 | elapsed time per iteration (ms): 13570.0 | learning rate: 8.033E-07 | global batch size: 16 | lm loss: 8.743597E+00 | loss scale: 4096.0 | grad norm: 73637.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 182/ 159576 | consumed samples: 2912 | elapsed time per iteration (ms): 13631.2 | learning rate: 8.077E-07 | global batch size: 16 | lm loss: 8.650385E+00 | loss scale: 4096.0 | grad norm: 170612.165 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 183/ 159576 | consumed samples: 2928 | elapsed time per iteration (ms): 13666.1 | learning rate: 8.121E-07 | global batch size: 16 | lm loss: 8.764441E+00 | loss scale: 4096.0 | grad norm: 157032.537 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 184/ 159576 | consumed samples: 2944 | elapsed time per iteration (ms): 14033.7 | learning rate: 8.166E-07 | global batch size: 16 | lm loss: 8.546231E+00 | loss scale: 4096.0 | grad norm: 68818.140 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 185/ 159576 | consumed samples: 2960 | elapsed time per iteration (ms): 13755.2 | learning rate: 8.210E-07 | global batch size: 16 | lm loss: 8.605597E+00 | loss scale: 4096.0 | grad norm: 245599.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 186/ 159576 | consumed samples: 2976 | elapsed time per iteration (ms): 13693.9 | learning rate: 8.254E-07 | global batch size: 16 | lm loss: 8.735710E+00 | loss scale: 4096.0 | grad norm: 193090.020 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 187/ 159576 | consumed samples: 2992 | elapsed time per iteration (ms): 13666.7 | learning rate: 8.299E-07 | global batch size: 16 | lm loss: 8.800616E+00 | loss scale: 4096.0 | grad norm: 121643.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 188/ 159576 | consumed samples: 3008 | elapsed time per iteration (ms): 13617.1 | learning rate: 8.343E-07 | global batch size: 16 | lm loss: 8.450140E+00 | loss scale: 4096.0 | grad norm: 91010.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 189/ 159576 | consumed samples: 3024 | elapsed time per iteration (ms): 14107.4 | learning rate: 8.388E-07 | global batch size: 16 | lm loss: 8.680673E+00 | loss scale: 4096.0 | grad norm: 171815.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 190/ 159576 | consumed samples: 3040 | elapsed time per iteration (ms): 13662.7 | learning rate: 8.432E-07 | global batch size: 16 | lm loss: 8.619300E+00 | loss scale: 4096.0 | grad norm: 80825.030 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 191/ 159576 | consumed samples: 3056 | elapsed time per iteration (ms): 13715.7 | learning rate: 8.476E-07 | global batch size: 16 | lm loss: 8.438683E+00 | loss scale: 4096.0 | grad norm: 68255.978 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 192/ 159576 | consumed samples: 3072 | elapsed time per iteration (ms): 13611.5 | learning rate: 8.521E-07 | global batch size: 16 | lm loss: 8.685935E+00 | loss scale: 4096.0 | grad norm: 100702.747 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 193/ 159576 | consumed samples: 3088 | elapsed time per iteration (ms): 14234.2 | learning rate: 8.565E-07 | global batch size: 16 | lm loss: 8.644808E+00 | loss scale: 4096.0 | grad norm: 193299.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 194/ 159576 | consumed samples: 3104 | elapsed time per iteration (ms): 13631.4 | learning rate: 8.609E-07 | global batch size: 16 | lm loss: 8.574228E+00 | loss scale: 4096.0 | grad norm: 141638.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 195/ 159576 | consumed samples: 3120 | elapsed time per iteration (ms): 13610.1 | learning rate: 8.654E-07 | global batch size: 16 | lm loss: 8.461662E+00 | loss scale: 4096.0 | grad norm: 102623.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 196/ 159576 | consumed samples: 3136 | elapsed time per iteration (ms): 13581.2 | learning rate: 8.698E-07 | global batch size: 16 | lm loss: 8.478310E+00 | loss scale: 4096.0 | grad norm: 64740.797 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 197/ 159576 | consumed samples: 3152 | elapsed time per iteration (ms): 13626.3 | learning rate: 8.743E-07 | global batch size: 16 | lm loss: 8.468125E+00 | loss scale: 4096.0 | grad norm: 113590.460 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 198/ 159576 | consumed samples: 3168 | elapsed time per iteration (ms): 14045.8 | learning rate: 8.787E-07 | global batch size: 16 | lm loss: 8.800446E+00 | loss scale: 4096.0 | grad norm: 157117.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 199/ 159576 | consumed samples: 3184 | elapsed time per iteration (ms): 13670.2 | learning rate: 8.831E-07 | global batch size: 16 | lm loss: 8.530574E+00 | loss scale: 4096.0 | grad norm: 71020.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 200/ 159576 | consumed samples: 3200 | elapsed time per iteration (ms): 13673.4 | learning rate: 8.876E-07 | global batch size: 16 | lm loss: 8.573134E+00 | loss scale: 4096.0 | grad norm: 68974.846 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 201/ 159576 | consumed samples: 3216 | elapsed time per iteration (ms): 13793.0 | learning rate: 8.920E-07 | global batch size: 16 | lm loss: 8.408599E+00 | loss scale: 4096.0 | grad norm: 69080.768 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 202/ 159576 | consumed samples: 3232 | elapsed time per iteration (ms): 13826.3 | learning rate: 8.964E-07 | global batch size: 16 | lm loss: 8.511511E+00 | loss scale: 4096.0 | grad norm: 111260.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 203/ 159576 | consumed samples: 3248 | elapsed time per iteration (ms): 13532.8 | learning rate: 9.009E-07 | global batch size: 16 | lm loss: 8.359414E+00 | loss scale: 4096.0 | grad norm: 178104.845 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 204/ 159576 | consumed samples: 3264 | elapsed time per iteration (ms): 13664.5 | learning rate: 9.053E-07 | global batch size: 16 | lm loss: 8.641071E+00 | loss scale: 4096.0 | grad norm: 200697.121 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 205/ 159576 | consumed samples: 3280 | elapsed time per iteration (ms): 13644.0 | learning rate: 9.098E-07 | global batch size: 16 | lm loss: 8.579686E+00 | loss scale: 4096.0 | grad norm: 127286.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 206/ 159576 | consumed samples: 3296 | elapsed time per iteration (ms): 14372.0 | learning rate: 9.142E-07 | global batch size: 16 | lm loss: 8.340457E+00 | loss scale: 4096.0 | grad norm: 79901.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 207/ 159576 | consumed samples: 3312 | elapsed time per iteration (ms): 13542.0 | learning rate: 9.186E-07 | global batch size: 16 | lm loss: 8.573874E+00 | loss scale: 4096.0 | grad norm: 54182.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 208/ 159576 | consumed samples: 3328 | elapsed time per iteration (ms): 13770.4 | learning rate: 9.231E-07 | global batch size: 16 | lm loss: 8.671753E+00 | loss scale: 4096.0 | grad norm: 118528.691 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 209/ 159576 | consumed samples: 3344 | elapsed time per iteration (ms): 13735.7 | learning rate: 9.275E-07 | global batch size: 16 | lm loss: 8.323320E+00 | loss scale: 4096.0 | grad norm: 84996.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 210/ 159576 | consumed samples: 3360 | elapsed time per iteration (ms): 13465.7 | learning rate: 9.320E-07 | global batch size: 16 | lm loss: 8.521966E+00 | loss scale: 4096.0 | grad norm: 58490.816 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 211/ 159576 | consumed samples: 3376 | elapsed time per iteration (ms): 14045.3 | learning rate: 9.364E-07 | global batch size: 16 | lm loss: 8.366361E+00 | loss scale: 4096.0 | grad norm: 60420.660 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 212/ 159576 | consumed samples: 3392 | elapsed time per iteration (ms): 13641.0 | learning rate: 9.408E-07 | global batch size: 16 | lm loss: 8.510538E+00 | loss scale: 4096.0 | grad norm: 107003.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 213/ 159576 | consumed samples: 3408 | elapsed time per iteration (ms): 13705.1 | learning rate: 9.453E-07 | global batch size: 16 | lm loss: 8.749462E+00 | loss scale: 4096.0 | grad norm: 127548.939 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 214/ 159576 | consumed samples: 3424 | elapsed time per iteration (ms): 13700.1 | learning rate: 9.497E-07 | global batch size: 16 | lm loss: 8.406161E+00 | loss scale: 4096.0 | grad norm: 77133.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 215/ 159576 | consumed samples: 3440 | elapsed time per iteration (ms): 14278.2 | learning rate: 9.541E-07 | global batch size: 16 | lm loss: 8.418405E+00 | loss scale: 4096.0 | grad norm: 62254.176 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 216/ 159576 | consumed samples: 3456 | elapsed time per iteration (ms): 13592.8 | learning rate: 9.586E-07 | global batch size: 16 | lm loss: 8.472538E+00 | loss scale: 4096.0 | grad norm: 50530.895 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 217/ 159576 | consumed samples: 3472 | elapsed time per iteration (ms): 13518.7 | learning rate: 9.630E-07 | global batch size: 16 | lm loss: 8.448650E+00 | loss scale: 4096.0 | grad norm: 80646.746 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 218/ 159576 | consumed samples: 3488 | elapsed time per iteration (ms): 13661.2 | learning rate: 9.675E-07 | global batch size: 16 | lm loss: 7.734177E+00 | loss scale: 4096.0 | grad norm: 149486.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 219/ 159576 | consumed samples: 3504 | elapsed time per iteration (ms): 14068.7 | learning rate: 9.719E-07 | global batch size: 16 | lm loss: 8.294590E+00 | loss scale: 4096.0 | grad norm: 56571.951 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 220/ 159576 | consumed samples: 3520 | elapsed time per iteration (ms): 13630.3 | learning rate: 9.763E-07 | global batch size: 16 | lm loss: 8.257124E+00 | loss scale: 4096.0 | grad norm: 62046.509 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 221/ 159576 | consumed samples: 3536 | elapsed time per iteration (ms): 13703.1 | learning rate: 9.808E-07 | global batch size: 16 | lm loss: 8.288898E+00 | loss scale: 4096.0 | grad norm: 59852.189 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 222/ 159576 | consumed samples: 3552 | elapsed time per iteration (ms): 13772.5 | learning rate: 9.852E-07 | global batch size: 16 | lm loss: 8.155066E+00 | loss scale: 4096.0 | grad norm: 58014.079 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 223/ 159576 | consumed samples: 3568 | elapsed time per iteration (ms): 13771.9 | learning rate: 9.896E-07 | global batch size: 16 | lm loss: 8.263331E+00 | loss scale: 4096.0 | grad norm: 63268.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 224/ 159576 | consumed samples: 3584 | elapsed time per iteration (ms): 14010.9 | learning rate: 9.941E-07 | global batch size: 16 | lm loss: 8.163802E+00 | loss scale: 4096.0 | grad norm: 57272.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 225/ 159576 | consumed samples: 3600 | elapsed time per iteration (ms): 13593.2 | learning rate: 9.985E-07 | global batch size: 16 | lm loss: 8.163125E+00 | loss scale: 4096.0 | grad norm: 42586.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 226/ 159576 | consumed samples: 3616 | elapsed time per iteration (ms): 13655.1 | learning rate: 1.003E-06 | global batch size: 16 | lm loss: 8.360060E+00 | loss scale: 4096.0 | grad norm: 122218.171 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 227/ 159576 | consumed samples: 3632 | elapsed time per iteration (ms): 13648.6 | learning rate: 1.007E-06 | global batch size: 16 | lm loss: 8.255043E+00 | loss scale: 4096.0 | grad norm: 85521.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 228/ 159576 | consumed samples: 3648 | elapsed time per iteration (ms): 14030.4 | learning rate: 1.012E-06 | global batch size: 16 | lm loss: 8.261985E+00 | loss scale: 4096.0 | grad norm: 67005.701 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 229/ 159576 | consumed samples: 3664 | elapsed time per iteration (ms): 13712.9 | learning rate: 1.016E-06 | global batch size: 16 | lm loss: 8.186491E+00 | loss scale: 4096.0 | grad norm: 56484.916 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 230/ 159576 | consumed samples: 3680 | elapsed time per iteration (ms): 13908.9 | learning rate: 1.021E-06 | global batch size: 16 | lm loss: 8.405298E+00 | loss scale: 4096.0 | grad norm: 76846.855 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 231/ 159576 | consumed samples: 3696 | elapsed time per iteration (ms): 13436.7 | learning rate: 1.025E-06 | global batch size: 16 | lm loss: 8.396565E+00 | loss scale: 4096.0 | grad norm: 65903.685 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 232/ 159576 | consumed samples: 3712 | elapsed time per iteration (ms): 13847.3 | learning rate: 1.030E-06 | global batch size: 16 | lm loss: 8.280029E+00 | loss scale: 4096.0 | grad norm: 49376.518 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 233/ 159576 | consumed samples: 3728 | elapsed time per iteration (ms): 13817.4 | learning rate: 1.034E-06 | global batch size: 16 | lm loss: 8.356775E+00 | loss scale: 4096.0 | grad norm: 59866.023 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 234/ 159576 | consumed samples: 3744 | elapsed time per iteration (ms): 13586.3 | learning rate: 1.038E-06 | global batch size: 16 | lm loss: 8.429869E+00 | loss scale: 4096.0 | grad norm: 177436.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 235/ 159576 | consumed samples: 3760 | elapsed time per iteration (ms): 13599.7 | learning rate: 1.043E-06 | global batch size: 16 | lm loss: 8.434436E+00 | loss scale: 4096.0 | grad norm: 135413.910 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 236/ 159576 | consumed samples: 3776 | elapsed time per iteration (ms): 13650.1 | learning rate: 1.047E-06 | global batch size: 16 | lm loss: 8.271558E+00 | loss scale: 4096.0 | grad norm: 90861.034 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 237/ 159576 | consumed samples: 3792 | elapsed time per iteration (ms): 14163.4 | learning rate: 1.052E-06 | global batch size: 16 | lm loss: 8.303068E+00 | loss scale: 4096.0 | grad norm: 54299.730 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 238/ 159576 | consumed samples: 3808 | elapsed time per iteration (ms): 13595.2 | learning rate: 1.056E-06 | global batch size: 16 | lm loss: 8.246891E+00 | loss scale: 4096.0 | grad norm: 58398.807 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 239/ 159576 | consumed samples: 3824 | elapsed time per iteration (ms): 13633.1 | learning rate: 1.061E-06 | global batch size: 16 | lm loss: 8.223282E+00 | loss scale: 4096.0 | grad norm: 58574.140 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 240/ 159576 | consumed samples: 3840 | elapsed time per iteration (ms): 13623.5 | learning rate: 1.065E-06 | global batch size: 16 | lm loss: 8.408007E+00 | loss scale: 4096.0 | grad norm: 128668.081 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 241/ 159576 | consumed samples: 3856 | elapsed time per iteration (ms): 14073.7 | learning rate: 1.070E-06 | global batch size: 16 | lm loss: 8.490035E+00 | loss scale: 4096.0 | grad norm: 228763.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 242/ 159576 | consumed samples: 3872 | elapsed time per iteration (ms): 13568.7 | learning rate: 1.074E-06 | global batch size: 16 | lm loss: 8.217072E+00 | loss scale: 4096.0 | grad norm: 54955.773 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 243/ 159576 | consumed samples: 3888 | elapsed time per iteration (ms): 13649.7 | learning rate: 1.078E-06 | global batch size: 16 | lm loss: 8.280759E+00 | loss scale: 4096.0 | grad norm: 70277.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 244/ 159576 | consumed samples: 3904 | elapsed time per iteration (ms): 13743.3 | learning rate: 1.083E-06 | global batch size: 16 | lm loss: 8.266622E+00 | loss scale: 4096.0 | grad norm: 52088.661 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 245/ 159576 | consumed samples: 3920 | elapsed time per iteration (ms): 13760.9 | learning rate: 1.087E-06 | global batch size: 16 | lm loss: 8.186391E+00 | loss scale: 4096.0 | grad norm: 45303.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 246/ 159576 | consumed samples: 3936 | elapsed time per iteration (ms): 13869.6 | learning rate: 1.092E-06 | global batch size: 16 | lm loss: 8.217053E+00 | loss scale: 4096.0 | grad norm: 66052.613 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 247/ 159576 | consumed samples: 3952 | elapsed time per iteration (ms): 13595.0 | learning rate: 1.096E-06 | global batch size: 16 | lm loss: 8.218720E+00 | loss scale: 4096.0 | grad norm: 63154.139 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 248/ 159576 | consumed samples: 3968 | elapsed time per iteration (ms): 13605.0 | learning rate: 1.101E-06 | global batch size: 16 | lm loss: 8.214328E+00 | loss scale: 4096.0 | grad norm: 54827.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 249/ 159576 | consumed samples: 3984 | elapsed time per iteration (ms): 13572.6 | learning rate: 1.105E-06 | global batch size: 16 | lm loss: 8.289627E+00 | loss scale: 4096.0 | grad norm: 112939.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 250/ 159576 | consumed samples: 4000 | elapsed time per iteration (ms): 13869.8 | learning rate: 1.109E-06 | global batch size: 16 | lm loss: 8.362014E+00 | loss scale: 4096.0 | grad norm: 56746.466 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 251/ 159576 | consumed samples: 4016 | elapsed time per iteration (ms): 13620.5 | learning rate: 1.114E-06 | global batch size: 16 | lm loss: 8.189938E+00 | loss scale: 4096.0 | grad norm: 56152.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 252/ 159576 | consumed samples: 4032 | elapsed time per iteration (ms): 13708.2 | learning rate: 1.118E-06 | global batch size: 16 | lm loss: 8.356908E+00 | loss scale: 4096.0 | grad norm: 78498.467 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 253/ 159576 | consumed samples: 4048 | elapsed time per iteration (ms): 13478.4 | learning rate: 1.123E-06 | global batch size: 16 | lm loss: 8.047684E+00 | loss scale: 4096.0 | grad norm: 66252.882 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 254/ 159576 | consumed samples: 4064 | elapsed time per iteration (ms): 14231.8 | learning rate: 1.127E-06 | global batch size: 16 | lm loss: 8.279363E+00 | loss scale: 4096.0 | grad norm: 85125.935 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 255/ 159576 | consumed samples: 4080 | elapsed time per iteration (ms): 13522.4 | learning rate: 1.132E-06 | global batch size: 16 | lm loss: 8.159877E+00 | loss scale: 4096.0 | grad norm: 48952.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 256/ 159576 | consumed samples: 4096 | elapsed time per iteration (ms): 13553.5 | learning rate: 1.136E-06 | global batch size: 16 | lm loss: 8.154376E+00 | loss scale: 4096.0 | grad norm: 41715.920 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 257/ 159576 | consumed samples: 4112 | elapsed time per iteration (ms): 13537.5 | learning rate: 1.141E-06 | global batch size: 16 | lm loss: 8.247561E+00 | loss scale: 4096.0 | grad norm: 57864.708 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 258/ 159576 | consumed samples: 4128 | elapsed time per iteration (ms): 13659.5 | learning rate: 1.145E-06 | global batch size: 16 | lm loss: 8.167631E+00 | loss scale: 4096.0 | grad norm: 45439.745 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 259/ 159576 | consumed samples: 4144 | elapsed time per iteration (ms): 14023.4 | learning rate: 1.149E-06 | global batch size: 16 | lm loss: 8.081510E+00 | loss scale: 4096.0 | grad norm: 54108.939 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 260/ 159576 | consumed samples: 4160 | elapsed time per iteration (ms): 13447.5 | learning rate: 1.154E-06 | global batch size: 16 | lm loss: 8.074065E+00 | loss scale: 4096.0 | grad norm: 45799.989 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 261/ 159576 | consumed samples: 4176 | elapsed time per iteration (ms): 13604.0 | learning rate: 1.158E-06 | global batch size: 16 | lm loss: 8.134088E+00 | loss scale: 4096.0 | grad norm: 34426.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 262/ 159576 | consumed samples: 4192 | elapsed time per iteration (ms): 13632.5 | learning rate: 1.163E-06 | global batch size: 16 | lm loss: 8.331153E+00 | loss scale: 4096.0 | grad norm: 241742.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 263/ 159576 | consumed samples: 4208 | elapsed time per iteration (ms): 14049.0 | learning rate: 1.167E-06 | global batch size: 16 | lm loss: 8.300336E+00 | loss scale: 4096.0 | grad norm: 89382.639 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 264/ 159576 | consumed samples: 4224 | elapsed time per iteration (ms): 13554.0 | learning rate: 1.172E-06 | global batch size: 16 | lm loss: 8.285131E+00 | loss scale: 4096.0 | grad norm: 56471.162 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 265/ 159576 | consumed samples: 4240 | elapsed time per iteration (ms): 13594.4 | learning rate: 1.176E-06 | global batch size: 16 | lm loss: 8.247953E+00 | loss scale: 4096.0 | grad norm: 59934.542 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 266/ 159576 | consumed samples: 4256 | elapsed time per iteration (ms): 13722.5 | learning rate: 1.180E-06 | global batch size: 16 | lm loss: 8.086367E+00 | loss scale: 4096.0 | grad norm: 49794.894 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 267/ 159576 | consumed samples: 4272 | elapsed time per iteration (ms): 13925.6 | learning rate: 1.185E-06 | global batch size: 16 | lm loss: 8.364625E+00 | loss scale: 4096.0 | grad norm: 198667.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 268/ 159576 | consumed samples: 4288 | elapsed time per iteration (ms): 13685.9 | learning rate: 1.189E-06 | global batch size: 16 | lm loss: 8.378025E+00 | loss scale: 4096.0 | grad norm: 206726.678 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 269/ 159576 | consumed samples: 4304 | elapsed time per iteration (ms): 13784.2 | learning rate: 1.194E-06 | global batch size: 16 | lm loss: 8.309950E+00 | loss scale: 4096.0 | grad norm: 102692.516 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 270/ 159576 | consumed samples: 4320 | elapsed time per iteration (ms): 13426.6 | learning rate: 1.198E-06 | global batch size: 16 | lm loss: 8.437682E+00 | loss scale: 4096.0 | grad norm: 53779.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 271/ 159576 | consumed samples: 4336 | elapsed time per iteration (ms): 13590.5 | learning rate: 1.203E-06 | global batch size: 16 | lm loss: 8.180303E+00 | loss scale: 4096.0 | grad norm: 41837.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 272/ 159576 | consumed samples: 4352 | elapsed time per iteration (ms): 13918.1 | learning rate: 1.207E-06 | global batch size: 16 | lm loss: 8.269817E+00 | loss scale: 4096.0 | grad norm: 60250.869 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 273/ 159576 | consumed samples: 4368 | elapsed time per iteration (ms): 13764.9 | learning rate: 1.212E-06 | global batch size: 16 | lm loss: 8.196259E+00 | loss scale: 4096.0 | grad norm: 51310.508 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 274/ 159576 | consumed samples: 4384 | elapsed time per iteration (ms): 13543.7 | learning rate: 1.216E-06 | global batch size: 16 | lm loss: 8.111527E+00 | loss scale: 4096.0 | grad norm: 62869.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 275/ 159576 | consumed samples: 4400 | elapsed time per iteration (ms): 13741.6 | learning rate: 1.220E-06 | global batch size: 16 | lm loss: 8.196915E+00 | loss scale: 4096.0 | grad norm: 56382.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 276/ 159576 | consumed samples: 4416 | elapsed time per iteration (ms): 14418.6 | learning rate: 1.225E-06 | global batch size: 16 | lm loss: 8.163618E+00 | loss scale: 4096.0 | grad norm: 59897.745 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 277/ 159576 | consumed samples: 4432 | elapsed time per iteration (ms): 13488.6 | learning rate: 1.229E-06 | global batch size: 16 | lm loss: 8.232466E+00 | loss scale: 4096.0 | grad norm: 106883.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 278/ 159576 | consumed samples: 4448 | elapsed time per iteration (ms): 13680.7 | learning rate: 1.234E-06 | global batch size: 16 | lm loss: 8.285415E+00 | loss scale: 4096.0 | grad norm: 52155.013 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 279/ 159576 | consumed samples: 4464 | elapsed time per iteration (ms): 13663.3 | learning rate: 1.238E-06 | global batch size: 16 | lm loss: 8.221471E+00 | loss scale: 4096.0 | grad norm: 43151.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 280/ 159576 | consumed samples: 4480 | elapsed time per iteration (ms): 13783.3 | learning rate: 1.243E-06 | global batch size: 16 | lm loss: 7.827011E+00 | loss scale: 4096.0 | grad norm: 60081.852 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 281/ 159576 | consumed samples: 4496 | elapsed time per iteration (ms): 13993.1 | learning rate: 1.247E-06 | global batch size: 16 | lm loss: 8.016405E+00 | loss scale: 4096.0 | grad norm: 60969.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 282/ 159576 | consumed samples: 4512 | elapsed time per iteration (ms): 13747.2 | learning rate: 1.251E-06 | global batch size: 16 | lm loss: 8.205744E+00 | loss scale: 4096.0 | grad norm: 64657.162 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 283/ 159576 | consumed samples: 4528 | elapsed time per iteration (ms): 13732.1 | learning rate: 1.256E-06 | global batch size: 16 | lm loss: 8.225381E+00 | loss scale: 4096.0 | grad norm: 46007.720 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 284/ 159576 | consumed samples: 4544 | elapsed time per iteration (ms): 13701.8 | learning rate: 1.260E-06 | global batch size: 16 | lm loss: 8.069484E+00 | loss scale: 4096.0 | grad norm: 50539.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 285/ 159576 | consumed samples: 4560 | elapsed time per iteration (ms): 13774.1 | learning rate: 1.265E-06 | global batch size: 16 | lm loss: 8.313256E+00 | loss scale: 4096.0 | grad norm: 75301.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 286/ 159576 | consumed samples: 4576 | elapsed time per iteration (ms): 13700.1 | learning rate: 1.269E-06 | global batch size: 16 | lm loss: 8.296308E+00 | loss scale: 4096.0 | grad norm: 109402.142 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 287/ 159576 | consumed samples: 4592 | elapsed time per iteration (ms): 13678.1 | learning rate: 1.274E-06 | global batch size: 16 | lm loss: 8.245502E+00 | loss scale: 4096.0 | grad norm: 53639.635 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 288/ 159576 | consumed samples: 4608 | elapsed time per iteration (ms): 13698.6 | learning rate: 1.278E-06 | global batch size: 16 | lm loss: 8.137961E+00 | loss scale: 4096.0 | grad norm: 42750.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 289/ 159576 | consumed samples: 4624 | elapsed time per iteration (ms): 14172.7 | learning rate: 1.283E-06 | global batch size: 16 | lm loss: 8.187901E+00 | loss scale: 4096.0 | grad norm: 108265.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 290/ 159576 | consumed samples: 4640 | elapsed time per iteration (ms): 13663.7 | learning rate: 1.287E-06 | global batch size: 16 | lm loss: 8.092007E+00 | loss scale: 4096.0 | grad norm: 61613.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 291/ 159576 | consumed samples: 4656 | elapsed time per iteration (ms): 13802.2 | learning rate: 1.291E-06 | global batch size: 16 | lm loss: 8.140871E+00 | loss scale: 4096.0 | grad norm: 73138.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 292/ 159576 | consumed samples: 4672 | elapsed time per iteration (ms): 13588.8 | learning rate: 1.296E-06 | global batch size: 16 | lm loss: 8.096482E+00 | loss scale: 4096.0 | grad norm: 56947.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 293/ 159576 | consumed samples: 4688 | elapsed time per iteration (ms): 13692.3 | learning rate: 1.300E-06 | global batch size: 16 | lm loss: 8.261303E+00 | loss scale: 4096.0 | grad norm: 50306.115 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 294/ 159576 | consumed samples: 4704 | elapsed time per iteration (ms): 13953.1 | learning rate: 1.305E-06 | global batch size: 16 | lm loss: 8.088846E+00 | loss scale: 4096.0 | grad norm: 70651.882 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 295/ 159576 | consumed samples: 4720 | elapsed time per iteration (ms): 13681.7 | learning rate: 1.309E-06 | global batch size: 16 | lm loss: 8.216883E+00 | loss scale: 4096.0 | grad norm: 109748.850 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 296/ 159576 | consumed samples: 4736 | elapsed time per iteration (ms): 13680.1 | learning rate: 1.314E-06 | global batch size: 16 | lm loss: 8.011025E+00 | loss scale: 4096.0 | grad norm: 57863.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 297/ 159576 | consumed samples: 4752 | elapsed time per iteration (ms): 13766.7 | learning rate: 1.318E-06 | global batch size: 16 | lm loss: 8.023094E+00 | loss scale: 4096.0 | grad norm: 39732.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 298/ 159576 | consumed samples: 4768 | elapsed time per iteration (ms): 14056.0 | learning rate: 1.322E-06 | global batch size: 16 | lm loss: 8.085699E+00 | loss scale: 4096.0 | grad norm: 93534.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 299/ 159576 | consumed samples: 4784 | elapsed time per iteration (ms): 13507.1 | learning rate: 1.327E-06 | global batch size: 16 | lm loss: 8.410425E+00 | loss scale: 4096.0 | grad norm: 42550.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 300/ 159576 | consumed samples: 4800 | elapsed time per iteration (ms): 13670.9 | learning rate: 1.331E-06 | global batch size: 16 | lm loss: 8.125405E+00 | loss scale: 4096.0 | grad norm: 37244.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 301/ 159576 | consumed samples: 4816 | elapsed time per iteration (ms): 13643.0 | learning rate: 1.336E-06 | global batch size: 16 | lm loss: 7.945562E+00 | loss scale: 4096.0 | grad norm: 37921.680 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 302/ 159576 | consumed samples: 4832 | elapsed time per iteration (ms): 14097.2 | learning rate: 1.340E-06 | global batch size: 16 | lm loss: 8.073545E+00 | loss scale: 4096.0 | grad norm: 80879.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 303/ 159576 | consumed samples: 4848 | elapsed time per iteration (ms): 13625.2 | learning rate: 1.345E-06 | global batch size: 16 | lm loss: 8.224352E+00 | loss scale: 4096.0 | grad norm: 75920.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 304/ 159576 | consumed samples: 4864 | elapsed time per iteration (ms): 13709.0 | learning rate: 1.349E-06 | global batch size: 16 | lm loss: 8.025059E+00 | loss scale: 4096.0 | grad norm: 39535.605 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 305/ 159576 | consumed samples: 4880 | elapsed time per iteration (ms): 13741.5 | learning rate: 1.354E-06 | global batch size: 16 | lm loss: 8.094482E+00 | loss scale: 4096.0 | grad norm: 40630.922 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 306/ 159576 | consumed samples: 4896 | elapsed time per iteration (ms): 13523.7 | learning rate: 1.358E-06 | global batch size: 16 | lm loss: 8.135887E+00 | loss scale: 4096.0 | grad norm: 80825.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 307/ 159576 | consumed samples: 4912 | elapsed time per iteration (ms): 14093.4 | learning rate: 1.362E-06 | global batch size: 16 | lm loss: 8.292034E+00 | loss scale: 4096.0 | grad norm: 86171.888 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 308/ 159576 | consumed samples: 4928 | elapsed time per iteration (ms): 13647.9 | learning rate: 1.367E-06 | global batch size: 16 | lm loss: 8.204563E+00 | loss scale: 4096.0 | grad norm: 46698.010 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 309/ 159576 | consumed samples: 4944 | elapsed time per iteration (ms): 13637.2 | learning rate: 1.371E-06 | global batch size: 16 | lm loss: 8.033182E+00 | loss scale: 4096.0 | grad norm: 42089.185 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 310/ 159576 | consumed samples: 4960 | elapsed time per iteration (ms): 13700.6 | learning rate: 1.376E-06 | global batch size: 16 | lm loss: 8.048797E+00 | loss scale: 4096.0 | grad norm: 56022.805 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 311/ 159576 | consumed samples: 4976 | elapsed time per iteration (ms): 14085.5 | learning rate: 1.380E-06 | global batch size: 16 | lm loss: 7.623003E+00 | loss scale: 4096.0 | grad norm: 72171.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 312/ 159576 | consumed samples: 4992 | elapsed time per iteration (ms): 13830.9 | learning rate: 1.385E-06 | global batch size: 16 | lm loss: 8.082812E+00 | loss scale: 4096.0 | grad norm: 39681.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 313/ 159576 | consumed samples: 5008 | elapsed time per iteration (ms): 13533.9 | learning rate: 1.389E-06 | global batch size: 16 | lm loss: 8.116117E+00 | loss scale: 4096.0 | grad norm: 33726.889 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 314/ 159576 | consumed samples: 5024 | elapsed time per iteration (ms): 13637.3 | learning rate: 1.393E-06 | global batch size: 16 | lm loss: 8.210217E+00 | loss scale: 4096.0 | grad norm: 89402.073 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 315/ 159576 | consumed samples: 5040 | elapsed time per iteration (ms): 14136.6 | learning rate: 1.398E-06 | global batch size: 16 | lm loss: 7.798199E+00 | loss scale: 4096.0 | grad norm: 83566.570 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 316/ 159576 | consumed samples: 5056 | elapsed time per iteration (ms): 13651.3 | learning rate: 1.402E-06 | global batch size: 16 | lm loss: 8.066372E+00 | loss scale: 4096.0 | grad norm: 38768.697 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 317/ 159576 | consumed samples: 5072 | elapsed time per iteration (ms): 13641.7 | learning rate: 1.407E-06 | global batch size: 16 | lm loss: 7.876265E+00 | loss scale: 4096.0 | grad norm: 36174.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 318/ 159576 | consumed samples: 5088 | elapsed time per iteration (ms): 13653.8 | learning rate: 1.411E-06 | global batch size: 16 | lm loss: 7.979768E+00 | loss scale: 4096.0 | grad norm: 66651.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 319/ 159576 | consumed samples: 5104 | elapsed time per iteration (ms): 13755.9 | learning rate: 1.416E-06 | global batch size: 16 | lm loss: 8.094232E+00 | loss scale: 4096.0 | grad norm: 79088.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 320/ 159576 | consumed samples: 5120 | elapsed time per iteration (ms): 13900.8 | learning rate: 1.420E-06 | global batch size: 16 | lm loss: 8.113304E+00 | loss scale: 4096.0 | grad norm: 52331.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 321/ 159576 | consumed samples: 5136 | elapsed time per iteration (ms): 13649.9 | learning rate: 1.425E-06 | global batch size: 16 | lm loss: 8.128990E+00 | loss scale: 4096.0 | grad norm: 46927.679 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 322/ 159576 | consumed samples: 5152 | elapsed time per iteration (ms): 13693.6 | learning rate: 1.429E-06 | global batch size: 16 | lm loss: 8.486778E+00 | loss scale: 4096.0 | grad norm: 89462.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 323/ 159576 | consumed samples: 5168 | elapsed time per iteration (ms): 13699.8 | learning rate: 1.433E-06 | global batch size: 16 | lm loss: 8.051263E+00 | loss scale: 4096.0 | grad norm: 42680.523 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 324/ 159576 | consumed samples: 5184 | elapsed time per iteration (ms): 14041.8 | learning rate: 1.438E-06 | global batch size: 16 | lm loss: 8.181097E+00 | loss scale: 4096.0 | grad norm: 43801.136 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 325/ 159576 | consumed samples: 5200 | elapsed time per iteration (ms): 13711.0 | learning rate: 1.442E-06 | global batch size: 16 | lm loss: 8.171723E+00 | loss scale: 4096.0 | grad norm: 47748.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 326/ 159576 | consumed samples: 5216 | elapsed time per iteration (ms): 13743.3 | learning rate: 1.447E-06 | global batch size: 16 | lm loss: 8.035454E+00 | loss scale: 4096.0 | grad norm: 58353.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 327/ 159576 | consumed samples: 5232 | elapsed time per iteration (ms): 13602.7 | learning rate: 1.451E-06 | global batch size: 16 | lm loss: 8.021453E+00 | loss scale: 4096.0 | grad norm: 44165.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 328/ 159576 | consumed samples: 5248 | elapsed time per iteration (ms): 13748.9 | learning rate: 1.456E-06 | global batch size: 16 | lm loss: 8.051726E+00 | loss scale: 4096.0 | grad norm: 35138.807 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 329/ 159576 | consumed samples: 5264 | elapsed time per iteration (ms): 13961.7 | learning rate: 1.460E-06 | global batch size: 16 | lm loss: 7.960547E+00 | loss scale: 4096.0 | grad norm: 41197.060 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 330/ 159576 | consumed samples: 5280 | elapsed time per iteration (ms): 13633.4 | learning rate: 1.464E-06 | global batch size: 16 | lm loss: 8.084079E+00 | loss scale: 4096.0 | grad norm: 43199.182 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 331/ 159576 | consumed samples: 5296 | elapsed time per iteration (ms): 13678.9 | learning rate: 1.469E-06 | global batch size: 16 | lm loss: 8.243130E+00 | loss scale: 4096.0 | grad norm: 39935.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 332/ 159576 | consumed samples: 5312 | elapsed time per iteration (ms): 13653.3 | learning rate: 1.473E-06 | global batch size: 16 | lm loss: 8.148146E+00 | loss scale: 4096.0 | grad norm: 31710.971 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 333/ 159576 | consumed samples: 5328 | elapsed time per iteration (ms): 13982.9 | learning rate: 1.478E-06 | global batch size: 16 | lm loss: 8.055049E+00 | loss scale: 4096.0 | grad norm: 40555.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 334/ 159576 | consumed samples: 5344 | elapsed time per iteration (ms): 13576.5 | learning rate: 1.482E-06 | global batch size: 16 | lm loss: 8.154724E+00 | loss scale: 4096.0 | grad norm: 98189.157 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 335/ 159576 | consumed samples: 5360 | elapsed time per iteration (ms): 13666.3 | learning rate: 1.487E-06 | global batch size: 16 | lm loss: 8.056485E+00 | loss scale: 4096.0 | grad norm: 53277.066 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 336/ 159576 | consumed samples: 5376 | elapsed time per iteration (ms): 13667.7 | learning rate: 1.491E-06 | global batch size: 16 | lm loss: 7.902112E+00 | loss scale: 4096.0 | grad norm: 35520.620 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 337/ 159576 | consumed samples: 5392 | elapsed time per iteration (ms): 14189.1 | learning rate: 1.496E-06 | global batch size: 16 | lm loss: 8.211933E+00 | loss scale: 4096.0 | grad norm: 102636.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 338/ 159576 | consumed samples: 5408 | elapsed time per iteration (ms): 13538.3 | learning rate: 1.500E-06 | global batch size: 16 | lm loss: 8.077993E+00 | loss scale: 4096.0 | grad norm: 74161.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 339/ 159576 | consumed samples: 5424 | elapsed time per iteration (ms): 13690.1 | learning rate: 1.504E-06 | global batch size: 16 | lm loss: 8.002722E+00 | loss scale: 4096.0 | grad norm: 41178.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 340/ 159576 | consumed samples: 5440 | elapsed time per iteration (ms): 13761.4 | learning rate: 1.509E-06 | global batch size: 16 | lm loss: 8.070647E+00 | loss scale: 4096.0 | grad norm: 146660.160 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 341/ 159576 | consumed samples: 5456 | elapsed time per iteration (ms): 13679.6 | learning rate: 1.513E-06 | global batch size: 16 | lm loss: 8.211810E+00 | loss scale: 4096.0 | grad norm: 56011.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 342/ 159576 | consumed samples: 5472 | elapsed time per iteration (ms): 13958.7 | learning rate: 1.518E-06 | global batch size: 16 | lm loss: 8.028828E+00 | loss scale: 4096.0 | grad norm: 45507.509 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 343/ 159576 | consumed samples: 5488 | elapsed time per iteration (ms): 13796.1 | learning rate: 1.522E-06 | global batch size: 16 | lm loss: 8.000618E+00 | loss scale: 4096.0 | grad norm: 41366.016 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 344/ 159576 | consumed samples: 5504 | elapsed time per iteration (ms): 13566.5 | learning rate: 1.527E-06 | global batch size: 16 | lm loss: 8.106353E+00 | loss scale: 4096.0 | grad norm: 86487.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 345/ 159576 | consumed samples: 5520 | elapsed time per iteration (ms): 13617.7 | learning rate: 1.531E-06 | global batch size: 16 | lm loss: 8.130958E+00 | loss scale: 4096.0 | grad norm: 65559.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 346/ 159576 | consumed samples: 5536 | elapsed time per iteration (ms): 14006.3 | learning rate: 1.536E-06 | global batch size: 16 | lm loss: 8.100373E+00 | loss scale: 4096.0 | grad norm: 50918.888 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 347/ 159576 | consumed samples: 5552 | elapsed time per iteration (ms): 13652.0 | learning rate: 1.540E-06 | global batch size: 16 | lm loss: 8.193462E+00 | loss scale: 4096.0 | grad norm: 49482.923 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 348/ 159576 | consumed samples: 5568 | elapsed time per iteration (ms): 13785.4 | learning rate: 1.544E-06 | global batch size: 16 | lm loss: 8.185720E+00 | loss scale: 4096.0 | grad norm: 33616.818 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 349/ 159576 | consumed samples: 5584 | elapsed time per iteration (ms): 13534.7 | learning rate: 1.549E-06 | global batch size: 16 | lm loss: 7.997324E+00 | loss scale: 4096.0 | grad norm: 41224.808 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 350/ 159576 | consumed samples: 5600 | elapsed time per iteration (ms): 14148.0 | learning rate: 1.553E-06 | global batch size: 16 | lm loss: 8.069170E+00 | loss scale: 4096.0 | grad norm: 61139.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 351/ 159576 | consumed samples: 5616 | elapsed time per iteration (ms): 13626.0 | learning rate: 1.558E-06 | global batch size: 16 | lm loss: 8.052499E+00 | loss scale: 4096.0 | grad norm: 58965.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 352/ 159576 | consumed samples: 5632 | elapsed time per iteration (ms): 13633.5 | learning rate: 1.562E-06 | global batch size: 16 | lm loss: 8.036291E+00 | loss scale: 4096.0 | grad norm: 38820.487 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 353/ 159576 | consumed samples: 5648 | elapsed time per iteration (ms): 13648.6 | learning rate: 1.567E-06 | global batch size: 16 | lm loss: 8.007360E+00 | loss scale: 4096.0 | grad norm: 33342.929 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 354/ 159576 | consumed samples: 5664 | elapsed time per iteration (ms): 13707.0 | learning rate: 1.571E-06 | global batch size: 16 | lm loss: 7.890161E+00 | loss scale: 4096.0 | grad norm: 62589.896 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 355/ 159576 | consumed samples: 5680 | elapsed time per iteration (ms): 14101.4 | learning rate: 1.575E-06 | global batch size: 16 | lm loss: 8.034273E+00 | loss scale: 4096.0 | grad norm: 62100.784 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 356/ 159576 | consumed samples: 5696 | elapsed time per iteration (ms): 13548.4 | learning rate: 1.580E-06 | global batch size: 16 | lm loss: 7.964279E+00 | loss scale: 4096.0 | grad norm: 37283.643 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 357/ 159576 | consumed samples: 5712 | elapsed time per iteration (ms): 13655.3 | learning rate: 1.584E-06 | global batch size: 16 | lm loss: 7.882459E+00 | loss scale: 4096.0 | grad norm: 36278.786 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 358/ 159576 | consumed samples: 5728 | elapsed time per iteration (ms): 13872.1 | learning rate: 1.589E-06 | global batch size: 16 | lm loss: 8.081428E+00 | loss scale: 4096.0 | grad norm: 59624.520 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 359/ 159576 | consumed samples: 5744 | elapsed time per iteration (ms): 13830.3 | learning rate: 1.593E-06 | global batch size: 16 | lm loss: 8.345490E+00 | loss scale: 4096.0 | grad norm: 101818.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 360/ 159576 | consumed samples: 5760 | elapsed time per iteration (ms): 13738.3 | learning rate: 1.598E-06 | global batch size: 16 | lm loss: 8.090802E+00 | loss scale: 4096.0 | grad norm: 37735.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 361/ 159576 | consumed samples: 5776 | elapsed time per iteration (ms): 13673.7 | learning rate: 1.602E-06 | global batch size: 16 | lm loss: 7.934822E+00 | loss scale: 4096.0 | grad norm: 35051.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 362/ 159576 | consumed samples: 5792 | elapsed time per iteration (ms): 13779.0 | learning rate: 1.607E-06 | global batch size: 16 | lm loss: 8.217977E+00 | loss scale: 4096.0 | grad norm: 81671.155 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 363/ 159576 | consumed samples: 5808 | elapsed time per iteration (ms): 14148.6 | learning rate: 1.611E-06 | global batch size: 16 | lm loss: 7.956856E+00 | loss scale: 4096.0 | grad norm: 123728.069 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 364/ 159576 | consumed samples: 5824 | elapsed time per iteration (ms): 13509.6 | learning rate: 1.615E-06 | global batch size: 16 | lm loss: 7.980748E+00 | loss scale: 4096.0 | grad norm: 64323.538 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 365/ 159576 | consumed samples: 5840 | elapsed time per iteration (ms): 13791.1 | learning rate: 1.620E-06 | global batch size: 16 | lm loss: 7.927495E+00 | loss scale: 4096.0 | grad norm: 38595.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 366/ 159576 | consumed samples: 5856 | elapsed time per iteration (ms): 13535.8 | learning rate: 1.624E-06 | global batch size: 16 | lm loss: 7.992770E+00 | loss scale: 4096.0 | grad norm: 34786.799 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 367/ 159576 | consumed samples: 5872 | elapsed time per iteration (ms): 13709.6 | learning rate: 1.629E-06 | global batch size: 16 | lm loss: 8.033854E+00 | loss scale: 4096.0 | grad norm: 26681.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 368/ 159576 | consumed samples: 5888 | elapsed time per iteration (ms): 13923.8 | learning rate: 1.633E-06 | global batch size: 16 | lm loss: 8.086361E+00 | loss scale: 4096.0 | grad norm: 116063.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 369/ 159576 | consumed samples: 5904 | elapsed time per iteration (ms): 13743.2 | learning rate: 1.638E-06 | global batch size: 16 | lm loss: 8.136069E+00 | loss scale: 4096.0 | grad norm: 192843.981 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 370/ 159576 | consumed samples: 5920 | elapsed time per iteration (ms): 13586.5 | learning rate: 1.642E-06 | global batch size: 16 | lm loss: 8.213842E+00 | loss scale: 4096.0 | grad norm: 66749.630 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 371/ 159576 | consumed samples: 5936 | elapsed time per iteration (ms): 13637.5 | learning rate: 1.646E-06 | global batch size: 16 | lm loss: 7.862526E+00 | loss scale: 4096.0 | grad norm: 35628.877 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 372/ 159576 | consumed samples: 5952 | elapsed time per iteration (ms): 14269.3 | learning rate: 1.651E-06 | global batch size: 16 | lm loss: 8.111351E+00 | loss scale: 4096.0 | grad norm: 51284.654 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 373/ 159576 | consumed samples: 5968 | elapsed time per iteration (ms): 13424.8 | learning rate: 1.655E-06 | global batch size: 16 | lm loss: 7.860275E+00 | loss scale: 4096.0 | grad norm: 51885.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 374/ 159576 | consumed samples: 5984 | elapsed time per iteration (ms): 13638.9 | learning rate: 1.660E-06 | global batch size: 16 | lm loss: 7.995843E+00 | loss scale: 4096.0 | grad norm: 40982.716 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 375/ 159576 | consumed samples: 6000 | elapsed time per iteration (ms): 13719.8 | learning rate: 1.664E-06 | global batch size: 16 | lm loss: 7.989121E+00 | loss scale: 4096.0 | grad norm: 43694.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 376/ 159576 | consumed samples: 6016 | elapsed time per iteration (ms): 13718.2 | learning rate: 1.669E-06 | global batch size: 16 | lm loss: 8.054690E+00 | loss scale: 4096.0 | grad norm: 56142.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 377/ 159576 | consumed samples: 6032 | elapsed time per iteration (ms): 14087.0 | learning rate: 1.673E-06 | global batch size: 16 | lm loss: 8.145277E+00 | loss scale: 4096.0 | grad norm: 77837.877 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 378/ 159576 | consumed samples: 6048 | elapsed time per iteration (ms): 13621.7 | learning rate: 1.678E-06 | global batch size: 16 | lm loss: 7.879861E+00 | loss scale: 4096.0 | grad norm: 35054.780 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 379/ 159576 | consumed samples: 6064 | elapsed time per iteration (ms): 13676.7 | learning rate: 1.682E-06 | global batch size: 16 | lm loss: 7.996103E+00 | loss scale: 4096.0 | grad norm: 31871.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 380/ 159576 | consumed samples: 6080 | elapsed time per iteration (ms): 13756.2 | learning rate: 1.686E-06 | global batch size: 16 | lm loss: 7.788074E+00 | loss scale: 4096.0 | grad norm: 30378.507 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 381/ 159576 | consumed samples: 6096 | elapsed time per iteration (ms): 13731.7 | learning rate: 1.691E-06 | global batch size: 16 | lm loss: 7.998044E+00 | loss scale: 4096.0 | grad norm: 78167.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 382/ 159576 | consumed samples: 6112 | elapsed time per iteration (ms): 13696.8 | learning rate: 1.695E-06 | global batch size: 16 | lm loss: 8.001510E+00 | loss scale: 4096.0 | grad norm: 57981.800 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 383/ 159576 | consumed samples: 6128 | elapsed time per iteration (ms): 13688.0 | learning rate: 1.700E-06 | global batch size: 16 | lm loss: 8.043833E+00 | loss scale: 4096.0 | grad norm: 40631.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 384/ 159576 | consumed samples: 6144 | elapsed time per iteration (ms): 13680.4 | learning rate: 1.704E-06 | global batch size: 16 | lm loss: 8.029270E+00 | loss scale: 4096.0 | grad norm: 31579.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 385/ 159576 | consumed samples: 6160 | elapsed time per iteration (ms): 14057.5 | learning rate: 1.709E-06 | global batch size: 16 | lm loss: 8.156369E+00 | loss scale: 4096.0 | grad norm: 87842.060 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 386/ 159576 | consumed samples: 6176 | elapsed time per iteration (ms): 13765.1 | learning rate: 1.713E-06 | global batch size: 16 | lm loss: 8.024692E+00 | loss scale: 4096.0 | grad norm: 56881.857 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 387/ 159576 | consumed samples: 6192 | elapsed time per iteration (ms): 13768.8 | learning rate: 1.717E-06 | global batch size: 16 | lm loss: 7.997876E+00 | loss scale: 4096.0 | grad norm: 31105.819 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 388/ 159576 | consumed samples: 6208 | elapsed time per iteration (ms): 13433.5 | learning rate: 1.722E-06 | global batch size: 16 | lm loss: 7.985063E+00 | loss scale: 4096.0 | grad norm: 78090.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 389/ 159576 | consumed samples: 6224 | elapsed time per iteration (ms): 13675.2 | learning rate: 1.726E-06 | global batch size: 16 | lm loss: 7.926050E+00 | loss scale: 4096.0 | grad norm: 61534.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 390/ 159576 | consumed samples: 6240 | elapsed time per iteration (ms): 13989.4 | learning rate: 1.731E-06 | global batch size: 16 | lm loss: 7.938218E+00 | loss scale: 4096.0 | grad norm: 37749.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 391/ 159576 | consumed samples: 6256 | elapsed time per iteration (ms): 13663.4 | learning rate: 1.735E-06 | global batch size: 16 | lm loss: 7.835842E+00 | loss scale: 4096.0 | grad norm: 48700.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 392/ 159576 | consumed samples: 6272 | elapsed time per iteration (ms): 13682.5 | learning rate: 1.740E-06 | global batch size: 16 | lm loss: 7.976984E+00 | loss scale: 4096.0 | grad norm: 45273.731 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 393/ 159576 | consumed samples: 6288 | elapsed time per iteration (ms): 13680.3 | learning rate: 1.744E-06 | global batch size: 16 | lm loss: 8.063533E+00 | loss scale: 4096.0 | grad norm: 62966.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 394/ 159576 | consumed samples: 6304 | elapsed time per iteration (ms): 14158.6 | learning rate: 1.749E-06 | global batch size: 16 | lm loss: 7.962408E+00 | loss scale: 4096.0 | grad norm: 38917.941 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 395/ 159576 | consumed samples: 6320 | elapsed time per iteration (ms): 13412.3 | learning rate: 1.753E-06 | global batch size: 16 | lm loss: 7.930057E+00 | loss scale: 4096.0 | grad norm: 59046.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 396/ 159576 | consumed samples: 6336 | elapsed time per iteration (ms): 13631.9 | learning rate: 1.757E-06 | global batch size: 16 | lm loss: 8.137497E+00 | loss scale: 4096.0 | grad norm: 51299.741 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 397/ 159576 | consumed samples: 6352 | elapsed time per iteration (ms): 13706.0 | learning rate: 1.762E-06 | global batch size: 16 | lm loss: 8.020626E+00 | loss scale: 4096.0 | grad norm: 37056.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 398/ 159576 | consumed samples: 6368 | elapsed time per iteration (ms): 14158.0 | learning rate: 1.766E-06 | global batch size: 16 | lm loss: 8.114269E+00 | loss scale: 4096.0 | grad norm: 64105.827 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 399/ 159576 | consumed samples: 6384 | elapsed time per iteration (ms): 13628.9 | learning rate: 1.771E-06 | global batch size: 16 | lm loss: 8.186448E+00 | loss scale: 4096.0 | grad norm: 55633.908 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 400/ 159576 | consumed samples: 6400 | elapsed time per iteration (ms): 13727.5 | learning rate: 1.775E-06 | global batch size: 16 | lm loss: 8.182411E+00 | loss scale: 4096.0 | grad norm: 51312.945 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 401/ 159576 | consumed samples: 6416 | elapsed time per iteration (ms): 13749.7 | learning rate: 1.780E-06 | global batch size: 16 | lm loss: 8.020710E+00 | loss scale: 4096.0 | grad norm: 32983.756 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 402/ 159576 | consumed samples: 6432 | elapsed time per iteration (ms): 13473.4 | learning rate: 1.784E-06 | global batch size: 16 | lm loss: 7.970335E+00 | loss scale: 4096.0 | grad norm: 70699.597 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 403/ 159576 | consumed samples: 6448 | elapsed time per iteration (ms): 13904.7 | learning rate: 1.788E-06 | global batch size: 16 | lm loss: 7.993033E+00 | loss scale: 4096.0 | grad norm: 67107.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 404/ 159576 | consumed samples: 6464 | elapsed time per iteration (ms): 13683.9 | learning rate: 1.793E-06 | global batch size: 16 | lm loss: 8.091874E+00 | loss scale: 4096.0 | grad norm: 26716.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 405/ 159576 | consumed samples: 6480 | elapsed time per iteration (ms): 13642.3 | learning rate: 1.797E-06 | global batch size: 16 | lm loss: 8.088682E+00 | loss scale: 4096.0 | grad norm: 74507.909 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 406/ 159576 | consumed samples: 6496 | elapsed time per iteration (ms): 13688.7 | learning rate: 1.802E-06 | global batch size: 16 | lm loss: 8.134460E+00 | loss scale: 4096.0 | grad norm: 64155.050 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 407/ 159576 | consumed samples: 6512 | elapsed time per iteration (ms): 14175.7 | learning rate: 1.806E-06 | global batch size: 16 | lm loss: 8.105555E+00 | loss scale: 4096.0 | grad norm: 39464.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 408/ 159576 | consumed samples: 6528 | elapsed time per iteration (ms): 13703.7 | learning rate: 1.811E-06 | global batch size: 16 | lm loss: 7.988219E+00 | loss scale: 4096.0 | grad norm: 39779.639 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 409/ 159576 | consumed samples: 6544 | elapsed time per iteration (ms): 13499.5 | learning rate: 1.815E-06 | global batch size: 16 | lm loss: 7.931721E+00 | loss scale: 4096.0 | grad norm: 46421.169 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 410/ 159576 | consumed samples: 6560 | elapsed time per iteration (ms): 13608.5 | learning rate: 1.820E-06 | global batch size: 16 | lm loss: 7.944845E+00 | loss scale: 4096.0 | grad norm: 28537.165 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 411/ 159576 | consumed samples: 6576 | elapsed time per iteration (ms): 14088.6 | learning rate: 1.824E-06 | global batch size: 16 | lm loss: 7.955441E+00 | loss scale: 4096.0 | grad norm: 68818.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 412/ 159576 | consumed samples: 6592 | elapsed time per iteration (ms): 13613.5 | learning rate: 1.828E-06 | global batch size: 16 | lm loss: 8.293702E+00 | loss scale: 4096.0 | grad norm: 73315.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 413/ 159576 | consumed samples: 6608 | elapsed time per iteration (ms): 13670.1 | learning rate: 1.833E-06 | global batch size: 16 | lm loss: 7.982622E+00 | loss scale: 4096.0 | grad norm: 40882.033 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 414/ 159576 | consumed samples: 6624 | elapsed time per iteration (ms): 13753.2 | learning rate: 1.837E-06 | global batch size: 16 | lm loss: 7.981937E+00 | loss scale: 4096.0 | grad norm: 34929.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 415/ 159576 | consumed samples: 6640 | elapsed time per iteration (ms): 13749.7 | learning rate: 1.842E-06 | global batch size: 16 | lm loss: 8.060836E+00 | loss scale: 4096.0 | grad norm: 47572.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 416/ 159576 | consumed samples: 6656 | elapsed time per iteration (ms): 13758.6 | learning rate: 1.846E-06 | global batch size: 16 | lm loss: 8.002974E+00 | loss scale: 4096.0 | grad norm: 37872.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 417/ 159576 | consumed samples: 6672 | elapsed time per iteration (ms): 13599.2 | learning rate: 1.851E-06 | global batch size: 16 | lm loss: 7.972270E+00 | loss scale: 4096.0 | grad norm: 44233.921 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 418/ 159576 | consumed samples: 6688 | elapsed time per iteration (ms): 13571.0 | learning rate: 1.855E-06 | global batch size: 16 | lm loss: 8.249717E+00 | loss scale: 4096.0 | grad norm: 60770.929 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 419/ 159576 | consumed samples: 6704 | elapsed time per iteration (ms): 13598.5 | learning rate: 1.859E-06 | global batch size: 16 | lm loss: 7.861569E+00 | loss scale: 4096.0 | grad norm: 31277.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 420/ 159576 | consumed samples: 6720 | elapsed time per iteration (ms): 14077.1 | learning rate: 1.864E-06 | global batch size: 16 | lm loss: 7.965170E+00 | loss scale: 4096.0 | grad norm: 72793.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 421/ 159576 | consumed samples: 6736 | elapsed time per iteration (ms): 13383.0 | learning rate: 1.868E-06 | global batch size: 16 | lm loss: 7.907632E+00 | loss scale: 4096.0 | grad norm: 60405.796 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 422/ 159576 | consumed samples: 6752 | elapsed time per iteration (ms): 13739.1 | learning rate: 1.873E-06 | global batch size: 16 | lm loss: 8.041030E+00 | loss scale: 4096.0 | grad norm: 49156.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 423/ 159576 | consumed samples: 6768 | elapsed time per iteration (ms): 13364.3 | learning rate: 1.877E-06 | global batch size: 16 | lm loss: 7.965994E+00 | loss scale: 4096.0 | grad norm: 37382.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 424/ 159576 | consumed samples: 6784 | elapsed time per iteration (ms): 13509.2 | learning rate: 1.882E-06 | global batch size: 16 | lm loss: 7.979969E+00 | loss scale: 4096.0 | grad norm: 30214.011 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 425/ 159576 | consumed samples: 6800 | elapsed time per iteration (ms): 13784.5 | learning rate: 1.886E-06 | global batch size: 16 | lm loss: 7.877289E+00 | loss scale: 4096.0 | grad norm: 31571.817 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 426/ 159576 | consumed samples: 6816 | elapsed time per iteration (ms): 13491.5 | learning rate: 1.891E-06 | global batch size: 16 | lm loss: 8.049381E+00 | loss scale: 4096.0 | grad norm: 61185.189 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 427/ 159576 | consumed samples: 6832 | elapsed time per iteration (ms): 13530.6 | learning rate: 1.895E-06 | global batch size: 16 | lm loss: 7.963693E+00 | loss scale: 4096.0 | grad norm: 45639.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 428/ 159576 | consumed samples: 6848 | elapsed time per iteration (ms): 13594.4 | learning rate: 1.899E-06 | global batch size: 16 | lm loss: 7.874112E+00 | loss scale: 4096.0 | grad norm: 34163.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 429/ 159576 | consumed samples: 6864 | elapsed time per iteration (ms): 14157.2 | learning rate: 1.904E-06 | global batch size: 16 | lm loss: 8.141135E+00 | loss scale: 4096.0 | grad norm: 43864.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 430/ 159576 | consumed samples: 6880 | elapsed time per iteration (ms): 13539.3 | learning rate: 1.908E-06 | global batch size: 16 | lm loss: 7.883408E+00 | loss scale: 4096.0 | grad norm: 38957.139 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 431/ 159576 | consumed samples: 6896 | elapsed time per iteration (ms): 13542.5 | learning rate: 1.913E-06 | global batch size: 16 | lm loss: 7.858832E+00 | loss scale: 4096.0 | grad norm: 26292.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 432/ 159576 | consumed samples: 6912 | elapsed time per iteration (ms): 13843.5 | learning rate: 1.917E-06 | global batch size: 16 | lm loss: 7.901114E+00 | loss scale: 4096.0 | grad norm: 65782.734 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 433/ 159576 | consumed samples: 6928 | elapsed time per iteration (ms): 13570.9 | learning rate: 1.922E-06 | global batch size: 16 | lm loss: 8.025250E+00 | loss scale: 4096.0 | grad norm: 99671.911 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 434/ 159576 | consumed samples: 6944 | elapsed time per iteration (ms): 13645.1 | learning rate: 1.926E-06 | global batch size: 16 | lm loss: 7.512252E+00 | loss scale: 4096.0 | grad norm: 55130.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 435/ 159576 | consumed samples: 6960 | elapsed time per iteration (ms): 13607.8 | learning rate: 1.930E-06 | global batch size: 16 | lm loss: 7.858408E+00 | loss scale: 4096.0 | grad norm: 33670.129 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 436/ 159576 | consumed samples: 6976 | elapsed time per iteration (ms): 13679.8 | learning rate: 1.935E-06 | global batch size: 16 | lm loss: 7.844939E+00 | loss scale: 4096.0 | grad norm: 39814.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 437/ 159576 | consumed samples: 6992 | elapsed time per iteration (ms): 13689.9 | learning rate: 1.939E-06 | global batch size: 16 | lm loss: 8.013271E+00 | loss scale: 4096.0 | grad norm: 62672.031 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 438/ 159576 | consumed samples: 7008 | elapsed time per iteration (ms): 13781.3 | learning rate: 1.944E-06 | global batch size: 16 | lm loss: 7.903483E+00 | loss scale: 4096.0 | grad norm: 41414.951 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 439/ 159576 | consumed samples: 7024 | elapsed time per iteration (ms): 13527.3 | learning rate: 1.948E-06 | global batch size: 16 | lm loss: 8.131282E+00 | loss scale: 4096.0 | grad norm: 32283.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 440/ 159576 | consumed samples: 7040 | elapsed time per iteration (ms): 13501.3 | learning rate: 1.953E-06 | global batch size: 16 | lm loss: 7.865626E+00 | loss scale: 4096.0 | grad norm: 35041.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 441/ 159576 | consumed samples: 7056 | elapsed time per iteration (ms): 13519.5 | learning rate: 1.957E-06 | global batch size: 16 | lm loss: 7.741554E+00 | loss scale: 4096.0 | grad norm: 36249.919 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 442/ 159576 | consumed samples: 7072 | elapsed time per iteration (ms): 14043.2 | learning rate: 1.962E-06 | global batch size: 16 | lm loss: 7.954229E+00 | loss scale: 4096.0 | grad norm: 73161.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 443/ 159576 | consumed samples: 7088 | elapsed time per iteration (ms): 13566.1 | learning rate: 1.966E-06 | global batch size: 16 | lm loss: 7.943119E+00 | loss scale: 4096.0 | grad norm: 46167.002 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 444/ 159576 | consumed samples: 7104 | elapsed time per iteration (ms): 13755.3 | learning rate: 1.970E-06 | global batch size: 16 | lm loss: 7.861948E+00 | loss scale: 4096.0 | grad norm: 37826.022 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 445/ 159576 | consumed samples: 7120 | elapsed time per iteration (ms): 13434.4 | learning rate: 1.975E-06 | global batch size: 16 | lm loss: 7.838496E+00 | loss scale: 4096.0 | grad norm: 56817.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 446/ 159576 | consumed samples: 7136 | elapsed time per iteration (ms): 13607.2 | learning rate: 1.979E-06 | global batch size: 16 | lm loss: 7.932389E+00 | loss scale: 4096.0 | grad norm: 38213.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 447/ 159576 | consumed samples: 7152 | elapsed time per iteration (ms): 14012.8 | learning rate: 1.984E-06 | global batch size: 16 | lm loss: 7.808257E+00 | loss scale: 4096.0 | grad norm: 37539.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 448/ 159576 | consumed samples: 7168 | elapsed time per iteration (ms): 13428.4 | learning rate: 1.988E-06 | global batch size: 16 | lm loss: 7.818873E+00 | loss scale: 4096.0 | grad norm: 58774.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 449/ 159576 | consumed samples: 7184 | elapsed time per iteration (ms): 13533.7 | learning rate: 1.993E-06 | global batch size: 16 | lm loss: 8.147743E+00 | loss scale: 4096.0 | grad norm: 62996.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 450/ 159576 | consumed samples: 7200 | elapsed time per iteration (ms): 13606.8 | learning rate: 1.997E-06 | global batch size: 16 | lm loss: 8.094215E+00 | loss scale: 4096.0 | grad norm: 28180.185 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 451/ 159576 | consumed samples: 7216 | elapsed time per iteration (ms): 14132.6 | learning rate: 2.001E-06 | global batch size: 16 | lm loss: 7.781518E+00 | loss scale: 4096.0 | grad norm: 44504.183 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 452/ 159576 | consumed samples: 7232 | elapsed time per iteration (ms): 13368.4 | learning rate: 2.006E-06 | global batch size: 16 | lm loss: 8.044688E+00 | loss scale: 4096.0 | grad norm: 88794.745 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 453/ 159576 | consumed samples: 7248 | elapsed time per iteration (ms): 13584.3 | learning rate: 2.010E-06 | global batch size: 16 | lm loss: 7.851390E+00 | loss scale: 4096.0 | grad norm: 63860.892 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 454/ 159576 | consumed samples: 7264 | elapsed time per iteration (ms): 13723.9 | learning rate: 2.015E-06 | global batch size: 16 | lm loss: 7.919715E+00 | loss scale: 4096.0 | grad norm: 52314.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 455/ 159576 | consumed samples: 7280 | elapsed time per iteration (ms): 13869.1 | learning rate: 2.019E-06 | global batch size: 16 | lm loss: 7.873841E+00 | loss scale: 4096.0 | grad norm: 34440.715 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 456/ 159576 | consumed samples: 7296 | elapsed time per iteration (ms): 13582.9 | learning rate: 2.024E-06 | global batch size: 16 | lm loss: 8.021425E+00 | loss scale: 4096.0 | grad norm: 38108.651 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 457/ 159576 | consumed samples: 7312 | elapsed time per iteration (ms): 13563.2 | learning rate: 2.028E-06 | global batch size: 16 | lm loss: 8.019066E+00 | loss scale: 4096.0 | grad norm: 24882.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 458/ 159576 | consumed samples: 7328 | elapsed time per iteration (ms): 13638.8 | learning rate: 2.033E-06 | global batch size: 16 | lm loss: 8.016552E+00 | loss scale: 4096.0 | grad norm: 20634.945 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 459/ 159576 | consumed samples: 7344 | elapsed time per iteration (ms): 13616.8 | learning rate: 2.037E-06 | global batch size: 16 | lm loss: 7.754219E+00 | loss scale: 4096.0 | grad norm: 43242.810 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 460/ 159576 | consumed samples: 7360 | elapsed time per iteration (ms): 13985.2 | learning rate: 2.041E-06 | global batch size: 16 | lm loss: 7.788671E+00 | loss scale: 4096.0 | grad norm: 38608.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 461/ 159576 | consumed samples: 7376 | elapsed time per iteration (ms): 13736.9 | learning rate: 2.046E-06 | global batch size: 16 | lm loss: 7.806537E+00 | loss scale: 4096.0 | grad norm: 32594.750 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 462/ 159576 | consumed samples: 7392 | elapsed time per iteration (ms): 13386.0 | learning rate: 2.050E-06 | global batch size: 16 | lm loss: 7.940393E+00 | loss scale: 4096.0 | grad norm: 27037.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 463/ 159576 | consumed samples: 7408 | elapsed time per iteration (ms): 13564.9 | learning rate: 2.055E-06 | global batch size: 16 | lm loss: 7.988055E+00 | loss scale: 4096.0 | grad norm: 27394.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 464/ 159576 | consumed samples: 7424 | elapsed time per iteration (ms): 14013.6 | learning rate: 2.059E-06 | global batch size: 16 | lm loss: 8.004810E+00 | loss scale: 4096.0 | grad norm: 43759.686 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 465/ 159576 | consumed samples: 7440 | elapsed time per iteration (ms): 13546.2 | learning rate: 2.064E-06 | global batch size: 16 | lm loss: 7.704327E+00 | loss scale: 4096.0 | grad norm: 30191.115 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 466/ 159576 | consumed samples: 7456 | elapsed time per iteration (ms): 13671.9 | learning rate: 2.068E-06 | global batch size: 16 | lm loss: 7.774131E+00 | loss scale: 4096.0 | grad norm: 26963.554 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 467/ 159576 | consumed samples: 7472 | elapsed time per iteration (ms): 13643.6 | learning rate: 2.072E-06 | global batch size: 16 | lm loss: 7.856277E+00 | loss scale: 4096.0 | grad norm: 19255.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 468/ 159576 | consumed samples: 7488 | elapsed time per iteration (ms): 13848.0 | learning rate: 2.077E-06 | global batch size: 16 | lm loss: 7.999278E+00 | loss scale: 4096.0 | grad norm: 61835.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 469/ 159576 | consumed samples: 7504 | elapsed time per iteration (ms): 13946.4 | learning rate: 2.081E-06 | global batch size: 16 | lm loss: 7.747583E+00 | loss scale: 4096.0 | grad norm: 42910.556 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 470/ 159576 | consumed samples: 7520 | elapsed time per iteration (ms): 13471.2 | learning rate: 2.086E-06 | global batch size: 16 | lm loss: 7.847405E+00 | loss scale: 4096.0 | grad norm: 29043.806 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 471/ 159576 | consumed samples: 7536 | elapsed time per iteration (ms): 13595.6 | learning rate: 2.090E-06 | global batch size: 16 | lm loss: 7.886540E+00 | loss scale: 4096.0 | grad norm: 22573.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 472/ 159576 | consumed samples: 7552 | elapsed time per iteration (ms): 13582.6 | learning rate: 2.095E-06 | global batch size: 16 | lm loss: 7.949501E+00 | loss scale: 4096.0 | grad norm: 81307.755 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 473/ 159576 | consumed samples: 7568 | elapsed time per iteration (ms): 13977.1 | learning rate: 2.099E-06 | global batch size: 16 | lm loss: 7.798001E+00 | loss scale: 4096.0 | grad norm: 27221.701 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 474/ 159576 | consumed samples: 7584 | elapsed time per iteration (ms): 13666.7 | learning rate: 2.104E-06 | global batch size: 16 | lm loss: 7.990824E+00 | loss scale: 4096.0 | grad norm: 50253.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 474 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-24 04:00:46,754] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step474/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 474 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 17639.87 [exiting program after 110.0032222946485 minutes] datetime: 2021-09-24 04:00:58 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] op name ................ installed .. compatible transformer ............ [NO] ....... [OKAY] -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ninja .................. [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] op name ................ installed .. compatible quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- op name ................ installed .. compatible cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ninja .................. [OKAY] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] ninja .................. [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- ninja .................. [OKAY] transformer ............ [NO] ....... [OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... async_io[NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] .......utils [OKAY].................. [YES] ...... [OKAY] utils ..................quantizer [YES].............. ......[NO] [OKAY]....... [OKAY] quantizer .............. --------------------------------------------------[NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] ninja .................. [OKAY] ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible op name ................ installed .. compatible -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_ioasync_io .............................. [NO][NO] .............. [NO][NO] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils .................. [YES] ...... [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- ninja .................. [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system fused_adam ............. [NO] ....... [OKAY] meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- DeepSpeed C++/CUDA extension op report async_io ............... [NO] ....... [NO] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ninja .................. [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] op name ................ installed .. compatible -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system sparse_attn ............ [NO] ....... [OKAY] meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] ninja .................. [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] op name ................ installed .. compatible utils .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. [YES]utils ...... ..................[OKAY] [YES] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja op name ................ installed .. compatible --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] ....... [NO] fused_lamb ............. [NO] ....... [OKAY] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] [YES] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer .............. quantizer[NO] ..................... [NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_adam ............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. cpu_adam ............... [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. sparse_attn ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] async_io ...............utils [NO].................. .......[YES] [NO]...... [OKAY] transformer ............ [NO] ....... [OKAY] quantizer .............. [NO] .......transformer_inference [OKAY].. stochastic_transformer . [NO] ....... [OKAY] [NO] ....... --------------------------------------------------[OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... transformer_inference[NO] .. [NO] ....... [OKAY] utils .................. [YES]transformer_inference ........ [OKAY][NO] ....... quantizer .............. [OKAY][NO] ....... [OKAY] utils-------------------------------------------------- .................. [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. op name ................ installed .. compatible -------------------------------------------------- async_io ............... [NO] ....... [NO] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ninja .................. [OKAY] JIT compiled ops requires ninja -------------------------------------------------- ninja .................. [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] op name ................ installed .. compatible -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info: utils .................. [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] quantizer .............. [NO] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] DeepSpeed general environment info: transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 utils ..................utils [YES].................. [YES]...... ......[OKAY] [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 utils .................. [YES] ...... [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 DeepSpeed general environment info: nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed general environment info: JIT compiled ops requires ninja torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 async_io ............... [NO] ....... [NO] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ninja .................. [OKAY] transformer_inference .. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- utils .................. [YES] ...... [OKAY] op name ................ installed .. compatible -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] ninja .................. [OKAY] ninja .................. [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- op name ................ installed .. compatible sparse_attn ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] ...... quantizer[OKAY] .............. [NO]quantizer ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.-------------------------------------------------- -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- JIT compiled ops requires ninja torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils ..................utils [YES] ........................ [YES][OKAY] ...... [OKAY]quantizer .............. [NO] .......quantizer [OKAY].............. [NO] ....... --------------------------------------------------[OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY]ninja .................. [OKAY] -------------------------------------------------- op name ................ installed sparse_attn.. ............compatible [NO]-------------------------------------------------- ....... [OKAY] transformer ............cpu_adam [NO]............... .......[YES] [OKAY]...... [OKAY] stochastic_transformer . [NO] ....... fused_adam[OKAY] ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] ninja .................. [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version .................... torch version1.8.1 .................... 1.8.1torch cuda version ............... torch cuda version11.1 ............... nvcc version11.1 .....................nvcc version 11.2..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ................... deepspeed info0.4.2+bc17042, bc17042, big-science ................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY]-------------------------------------------------- DeepSpeed C++/CUDA extension op report transformer-------------------------------------------------- ............NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. [NO]-------------------------------------------------- JIT compiled ops requires ninja ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found ninja .................. [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] JIT compiled ops requires ninja torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... transformer_inference[NO] ......... [NO][NO] ....... [OKAY] utils .................. [YES]transformer_inference ........ [NO][OKAY] ....... [OKAY] quantizer .............. [NO]utils ......................... [OKAY][YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. **** Git info for Megatron: git_hash=unknown git_branch=unknown **** async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] ninja .................. [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] fused_adam ............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ninja .................. [OKAY] JIT compiled ops requires ninja -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer async_io.............. ...............[NO] [NO]....... .......[OKAY] [NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] op name ................ installed .. compatible quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] JIT compiled ops requires ninja fused_adam ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ninja .................. [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] op name ................ installed .. compatible -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science cpu_adam ............... [YES] ...... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] sparse_attn ............ [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer_inference .. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science async_io ............... [NO] ....... [NO]transformer_inference deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 .. [NO] ....... [OKAY] utilstransformer_inference .................... [YES][NO] ............. [OKAY][OKAY] /bin/sh: line 0: type: git: not found quantizer ..............utils [NO].................. .......[YES] [OKAY]...... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system **** Git info for Megatron: git_hash=unknown git_branch=unknown **** meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] op name ................ installed .. compatible sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 DeepSpeed general environment info: torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version DeepSpeed general environment info:..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch install path deepspeed info............... ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ......['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system utils .................. [YES] ...... [OKAY] meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 1.8.1 torch versiontorch cuda version ................................... 1.8.111.1 nvcc versiontorch cuda version .................................... 11.211.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. /bin/sh: line 0: type: git: not found deepspeed install pathnvcc version ................................ 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install pathdeepspeed info .............................. 0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] async_io ............... [NO] ....... [NO] ninja .................. [OKAY] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] op name ................ installed .. compatible -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] ninja .................. [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] fused_adam ............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY]ninja .................. fused_lamb[OKAY] .............-------------------------------------------------- [NO] op name....... ................[OKAY] installed .. compatible -------------------------------------------------- sparse_attn ............cpu_adam [NO]............... .......[YES] [OKAY]...... [OKAY] transformer ............ [NO] ....... [OKAY] fused_adam ............. [NO] stochastic_transformer....... [OKAY]. [NO] ....... fused_lamb[OKAY] ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- async_io ............... [NO] ....... [NO] JIT compiled ops requires ninja transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] ....... [OKAY]utils .................. [YES] ...... [OKAY]utils .................. [YES] ......quantizer [OKAY].............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path DeepSpeed general environment info:........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... torch install path0.4.2+bc17042, bc17042, big-science ...............deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] DeepSpeed general environment info: quantizer .............. [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 -------------------------------------------------- torch cuda version ............... 11.1 nvcc versionDeepSpeed general environment info: ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch install path deepspeed info............... ................... 0.4.2+bc17042, bc17042, big-science torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed wheel compiled w. ......['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 torch cuda version ............... 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch version .................... 1.8.1 torch cuda version ............... 11.1 async_io ............... [NO] ....... [NO] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] transformer_inference .. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. --------------------------------------------------[OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report op name-------------------------------------------------- ................ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.installed ..-------------------------------------------------- compatibleJIT compiled ops requires ninja -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch install path.................... ...............1.8.1 torch cuda version ............... 11.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version .....................torch version 11.2.................... deepspeed install path1.8.1 ........... torch cuda version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']............... 11.1deepspeed info ...................nvcc version 0.4.2+bc17042, bc17042, big-science..................... 11.2deepspeed wheel compiled w. deepspeed install path...... ...........torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info: torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch install path torch cuda version............... ............... 11.1 nvcc version .....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 11.2 deepspeed install path torch version........... .................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']1.8.1 deepspeed info ...................torch cuda version 0.4.2+bc17042, bc17042, big-science............... deepspeed wheel compiled w.11.1 ......nvcc version torch 1.8, cuda 11.1..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science async_io ............... [NO] ....... [NO] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info: DeepSpeed general environment info: utils .................. [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] quantizer .............. [NO] ....... [OKAY] torch version .................... 1.8.1 torch version .................... 1.8.1 -------------------------------------------------- torch cuda version ............... 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science nvcc version ..................... 11.2 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] ninja .................. [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- utils .................. [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1 nvcc version nvcc version..................... .....................11.2 11.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- DeepSpeed general environment info: op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 fused_adam ............. [NO] ....... [OKAY] torch cuda version ............... 11.1 fused_lamb ............. [NO] ....... [OKAY] nvcc version ..................... 11.2 sparse_attn ............ [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science transformer ............ [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO]............... .......[NO] [NO]....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] /bin/sh: line 0: type: git: not found transformer_inference .. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. [NO] ....... [OKAY] transformer_inferenceutils .................... [NO][YES] ............. [OKAY][OKAY] quantizer .............. utils[NO] ......................... [OKAY][YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- DeepSpeed general environment info: op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 fused_adam ............. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science fused_lamb ............. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. stochastic_transformer . [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] stochastic_transformer . [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] .......-------------------------------------------------- [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] JIT compiled ops requires ninja transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info: utils .................. [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] quantizer .............. [NO] ....... [OKAY] torch version .................... 1.8.1 -------------------------------------------------- torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] ninja .................. [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- utils .................. [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] DeepSpeed general environment info: fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 sparse_attn ............ [NO] ....... [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 transformer ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO]............... .......[NO] [NO]....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 utils .................. [YES] ...... [OKAY] nvcc version ..................... 11.2 quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version .................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1 nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] ....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] -------------------------------------------------- utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_io ............... [NO] ....... [NO] torch version .................... 1.8.1 transformer_inference .. [NO] ....... [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 utils .................. [YES] ...... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science quantizer .............. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ...............DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path ...............torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch cuda version ............... 11.1torch version nvcc version.................... .....................1.8.1 11.2 torch cuda versiondeepspeed install path .......................... 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] nvcc version deepspeed info..................... ...................11.2 0.4.2+bc17042, bc17042, big-science deepspeed install path deepspeed wheel compiled w............ ...... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed general environment info: -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. sparse_attn ............ [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`....... [OKAY] DeepSpeed general environment info: quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_io ............... [NO] ....... [NO] ninja .................. [OKAY] torch version .................... 1.8.1 transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- torch cuda version ............... 11.1 op name ................ installed .. compatible -------------------------------------------------- nvcc version ..................... 11.2 utils .................. [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] quantizer .............. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference utils.. ..................[NO] [YES]....... ......[OKAY] [OKAY] quantizerutils ................................ [NO][YES] ............. [OKAY][OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version .....................DeepSpeed general environment info: 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch install pathdeepspeed info .................................. 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. transformer_inference[NO] ......... [OKAY][NO] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] ninja .................. [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] op name ................ installed .. compatible -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info: torch version .................... 1.8.1 torch cuda version ............... 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** nvcc version ..................... 11.2 torch version .................... 1.8.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch cuda version ............... 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch version .................... 1.8.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science torch cuda version ............... 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.transformer_inference .. quantizer .............. [NO] ....... [OKAY] [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] async_io...... [OKAY]............... [NO] .......quantizer [NO] .............. [NO] ....... [OKAY] --------------------------------------------------transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible /bin/sh: line 0: type: git: not found -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] /bin/sh: line 0: type: git: not found transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] DeepSpeed general environment info: stochastic_transformer . [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... DeepSpeed general environment info:11.1 nvcc version ..................... 11.2 deepspeed install path ...........torch install path ...............['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 utils .................. [YES] ...... [OKAY] nvcc version ..................... 11.2 quantizer .............. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 nvcc version11.1 .....................nvcc version 11.2..................... deepspeed install path11.2 ........... deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... ...................torch 1.8, cuda 11.1 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1torch version .................... torch cuda version1.8.1 ............... 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed info deepspeed wheel compiled w.................... ......0.4.2+bc17042, bc17042, big-science torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install pathtorch version .................... ...............1.8.1 torch cuda version ............... 11.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version .....................torch version 11.2.................... deepspeed install path1.8.1 ........... torch cuda version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ............... deepspeed info11.1 ................... nvcc version0.4.2+bc17042, bc17042, big-science ..................... deepspeed wheel compiled w.11.2 ...... deepspeed install pathtorch 1.8, cuda 11.1 ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version .....................nvcc version 11.2..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ................... deepspeed info0.4.2+bc17042, bc17042, big-science ...................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version torch version............... ....................11.1 1.8.1nvcc version .....................torch cuda version 11.2............... deepspeed install path11.1 ........... nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']..................... 11.2deepspeed info ...................deepspeed install path 0.4.2+bc17042, bc17042, big-science........... deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ......deepspeed info torch 1.8, cuda 11.1................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 torch cuda version ............... 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io ............... [NO] ....... [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference .. [NO] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] nvcc version ..................... 11.2 -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] /bin/sh: line 0: type: git: not found [NO] ....... [OKAY] DeepSpeed general environment info: utils .................. [YES] utils...... [OKAY].................. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] [YES] ...... quantizer[OKAY] .............. [NO] .......quantizer [OKAY].............. torch version .................... 1.8.1 [NO] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 ninja .................. [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science op name ................ installed .. compatible deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 ninja .................. [OKAY] torch cuda version ............... 11.1 -------------------------------------------------- nvcc version ..................... 11.2 op name ................ installed .. compatible -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] cpu_adam ............... [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed info ................... 0.4.2+bc17042, bc17042, big-science fused_adam ............. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- sparse_attn NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op............. --------------------------------------------------[NO] JIT compiled ops requires ninja....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- ninjaop name .................................. installed ..[OKAY] compatible ---------------------------------------------------------------------------------------------------- op name ................ installed .. cpu_adamcompatible ............... --------------------------------------------------[YES] ...... [OKAY] cpu_adam ............... [YES]fused_adam ................... [NO][OKAY] ....... [OKAY] fused_lamb ............. [NO] ....... fused_adam[OKAY] ............. [NO] ....... [OKAY] fused_lamb sparse_attn............. ............[NO] [NO] .............. [OKAY][OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............stochastic_transformer [NO] ........ [NO] [OKAY]....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] .......[NO] [NO]....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2DeepSpeed general environment info: deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch install pathdeepspeed info .................................. 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference utils.. ..................[NO] [YES]....... ......[OKAY] [OKAY] quantizer utils.............. ..................[NO] [YES]....... ......[OKAY] [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] DeepSpeed general environment info: quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io ............... [NO] ....... [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference .. [NO] ....... [OKAY] torch version .................... 1.8.1 utils .................. [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] DeepSpeed general environment info: fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 sparse_attn ............ [NO] ....... [OKAY] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version .................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.DeepSpeed general environment info: ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible ninja-------------------------------------------------- .................. [OKAY] -------------------------------------------------- cpu_adamop name ............... ................[YES] installed...... ..[OKAY] compatible -------------------------------------------------- fused_adam cpu_adam............. [NO]............... ....... [YES][OKAY] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attnfused_lamb ............ .............[NO] [NO]....... .......[OKAY] [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attnstochastic_transformer ............. [NO][NO] .............. [OKAY][OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > setting tensorboard ...  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ...............DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch install path ...............torch version .................... 1.8.1 torch cuda version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ............... 11.1torch version nvcc version.................... .....................1.8.1 11.2 deepspeed install pathtorch cuda version ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info............... ................... 11.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ninja installed .................... [OKAY]compatible ---------------------------------------------------------------------------------------------------- op name ................ installed .. compatible cpu_adam-------------------------------------------------- ............... [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adamfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn transformer............ ............[NO] [NO]....... .......[OKAY] [OKAY] transformer ............ stochastic_transformer[NO] ....... .[OKAY] [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] .......[OKAY] [OKAY] utils utils.................. ..................[YES] [YES]...... ...... [OKAY][OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] --------------------------------------------------ninja op name .................................. [OKAY]installed .. --------------------------------------------------compatible --------------------------------------------------op name ................ installed .. compatible cpu_adam-------------------------------------------------- ............... [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lambfused_adam ............. .............[NO] [NO]....... [OKAY]....... [OKAY] fused_lamb ............. [NO] ....... sparse_attn[OKAY] ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY]sparse_attn ............ [NO] stochastic_transformer....... [OKAY]. [NO] .......transformer [OKAY]............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 DeepSpeed general environment info:deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ...................torch install path 0.4.2+bc17042, bc17042, big-science............... deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch install path ................................... 1.8.1 torch cuda version ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']11.1 nvcc version .....................torch version 11.2.................... deepspeed install path1.8.1 ........... torch cuda version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ............... deepspeed info11.1 ................... nvcc version0.4.2+bc17042, bc17042, big-science ..................... deepspeed wheel compiled w.11.2 ......deepspeed install path torch 1.8, cuda 11.1........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] [YES]...... ......[OKAY] [OKAY] fused_adamfused_adam .......................... [NO][NO] .............. [OKAY][OKAY] fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] transformertransformer ........................ [NO][NO] .............. [OKAY][OKAY] stochastic_transformerstochastic_transformer . .[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninja .................. ..................[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------op name ................ op nameinstalled .................. installedcompatible .. --------------------------------------------------compatible -------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] [YES]...... ...... [OKAY][OKAY] fused_adamfused_adam .......................... [NO][NO] .............. [OKAY][OKAY] fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] transformertransformer ........................ [NO][NO] .............. [OKAY][OKAY] stochastic_transformer stochastic_transformer . [NO]. .......[NO] [OKAY]....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- fused_adamop name ............................. [NO]installed ......... [OKAY]compatible -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] sparse_attn ............ [NO] .......fused_adam [OKAY] ............. [NO] transformer....... ............[OKAY] [NO] ....... [OKAY]fused_lamb ............. [NO] .......stochastic_transformer [OKAY] . [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaJIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................................................................ installedinstalledinstalled installed ...... .. compatible compatible compatiblecompatible---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... ...............cpu_adam[YES] cpu_adam [YES] ...... [OKAY]............... ..................... [YES][OKAY][YES] ............ fused_adam[OKAY][OKAY] ............. [NO] ....... fused_adam[OKAY] ............. [NO]fused_lambfused_adam fused_adam .................... ............. .............[OKAY] [NO] [NO] [NO] .......fused_lamb ....... .................... [OKAY][NO] [OKAY] [OKAY]....... [OKAY] fused_lamb fused_lamb............. ............. [NO][NO] .......sparse_attn....... [OKAY] ............ [OKAY]sparse_attn [NO] ............ .......[NO] [OKAY]....... [OKAY] transformer ............sparse_attntransformer sparse_attn [NO]........................ ...................[NO][NO] [NO].......[OKAY]....... .......[OKAY][OKAY] stochastic_transformer[OKAY] transformer stochastic_transformer.............transformer [NO][NO]. ............ ....... [NO]....... [NO] [OKAY] ....... [OKAY].......[OKAY] [OKAY] stochastic_transformer stochastic_transformer. [NO] ........ [NO][OKAY] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO] [NO]....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utilstransformer_inference .................... [YES][NO] ............. [OKAY][OKAY] quantizer .............. utils[NO] ......................... [YES][OKAY] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch install path.................... 1.8.1............... torch cuda version ............... 11.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']nvcc version  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ..................... torch version11.2 ....................deepspeed install path 1.8.1........... async_io ............... [NO] ....... [NO] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch cuda version ...............deepspeed info 11.1................... 0.4.2+bc17042, bc17042, big-sciencenvcc version deepspeed wheel compiled w...................... ......11.2 torch 1.8, cuda 11.1deepspeed install path transformer_inference .. [NO] ....... [OKAY] ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] utils .................. [YES] ...... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op nameop name op name ................op name................................ ................installedinstalled installed installed .... ....compatible compatible compatible-------------------------------------------------- compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... cpu_adam............... [YES]cpu_adam ............... [YES]..................... [OKAY][YES] ............ [YES] [OKAY] [OKAY] ...... [OKAY] fused_adam ............. fused_adam[NO]fused_adam ....... fused_adam............. ............. [OKAY] [NO] .............[NO] ....... fused_lamb[NO] [OKAY] ............. ....... ....... [NO] [OKAY] [OKAY] fused_lamb....... .............[OKAY] [NO]fused_lambfused_lamb ................................. [OKAY][NO][NO] .............. sparse_attn[OKAY][OKAY] ............ [NO] ....... sparse_attn[OKAY] ............ [NO]transformer ................... [OKAY]sparse_attn[NO]sparse_attn ...............................transformer [OKAY][NO]............[NO] ....... [NO] .......[OKAY]stochastic_transformer [OKAY]....... .transformer[OKAY] transformer[NO] ............................... stochastic_transformer[NO][OKAY] [NO] ............... [NO] [OKAY] [OKAY] ....... [OKAY]stochastic_transformer stochastic_transformer . [NO]. .......[NO] [OKAY]....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** using world size: 256, data-parallel-size: 8, tensor-model-parallel size: 4, pipeline-model-parallel size: 8 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/codecarbon consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False data_impl ....................................... mmap data_parallel_size .............................. 8 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1162747.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 5 evidence_data_path .............................. None exit_duration_in_mins ........................... 110 exit_interval ................................... None ffn_hidden_size ................................. 20480 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False global_batch_size ............................... 2048 hidden_dropout .................................. 0.1 hidden_size ..................................... 16384 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 512 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 1 log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ 126953125 lr_decay_style .................................. cosine lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 216320 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 32 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 32 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 8 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['16', '16', '6_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 42 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 300000000 use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 256 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 16 to global batch size 2048 with batch size increments 16 over 6000000 samples. > building GPT2BPETokenizer tokenizer ...  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference transformer_inference.. ..[NO] .......[NO] [OKAY]....... [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer quantizer.............. [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] .......transformer_inference ..[OKAY] [NO] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY]quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop name op nameop name ................ ................ ................ installed ................installedinstalled ..installed .. .. ..compatible compatible compatible compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adamcpu_adam ............... ............... .............................. [YES][YES][YES] ......[YES] ............ [OKAY] ......[OKAY] [OKAY] [OKAY] fused_adam .............fused_adam [NO]fused_adam............. ....................fused_adam [NO] [OKAY] [NO] ....... .......[OKAY].............fused_lamb [OKAY][NO]............. fused_lamb [NO]............. fused_lamb ....... ....... [NO]............. [OKAY] [OKAY] .......[NO] [OKAY]....... [OKAY] sparse_attn fused_lamb............ sparse_attn.............[NO] sparse_attn[NO]................... ............[OKAY][NO] [NO]..............transformer .......[OKAY]............ [OKAY][NO] .......[OKAY] transformer[OKAY]transformer ........................ [NO][NO] stochastic_transformer ....... ....... [OKAY][OKAY]. [NO] .......stochastic_transformerstochastic_transformer [OKAY] .sparse_attn . [NO] ............ [NO] ....... .......[OKAY] [NO][OKAY] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... torch install path1.8.1 ............... torch cuda version ............... 11.1 nvcc version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ..................... 11.2torch version deepspeed install path.................... ...........1.8.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch cuda version deepspeed info............... ...................11.1 nvcc version0.4.2+bc17042, bc17042, big-science .....................deepspeed wheel compiled w. 11.2...... deepspeed install pathtorch 1.8, cuda 11.1 ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io............... [NO]............... .......[NO] [NO]....... [NO] async_iotransformer_inference ................. transformer_inference[NO][NO] ................ [NO][NO][OKAY] ....... [OKAY] utils .................. utils[YES] transformer_inference .................. ...... ..[YES][OKAY] [NO]...... .......[OKAY] quantizer[OKAY] .............. quantizer[NO] .....................utils [NO][OKAY].................. .......[YES] [OKAY] --------------------------------------------------...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > setting codecarbon ... > initializing torch distributed ... > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 8 > setting random seeds to 42 ... [2021-09-24 04:01:23,432] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' >>> done with dataset index builder. Compilation time: 0.299 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 17.207 seconds time to initialize megatron (seconds): 4.980 [after megatron is initialized] datetime: 2021-09-24 04:01:40 building GPT model ... [2021-09-24 04:01:41,035] [INFO] [utils.py:680:see_memory_usage] Before Building Model /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved warnings.warn( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved warnings.warn( [2021-09-24 04:01:41,037] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-09-24 04:01:41,037] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 37.36 GB, percent = 20.0% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=0, data=4, model=0): 16, ProcessCoord(pipe=0, data=4, model=1): 17, ProcessCoord(pipe=0, data=4, model=2): 18, ProcessCoord(pipe=0, data=4, model=3): 19, ProcessCoord(pipe=0, data=5, model=0): 20, ProcessCoord(pipe=0, data=5, model=1): 21, ProcessCoord(pipe=0, data=5, model=2): 22, ProcessCoord(pipe=0, data=5, model=3): 23, ProcessCoord(pipe=0, data=6, model=0): 24, ProcessCoord(pipe=0, data=6, model=1): 25, ProcessCoord(pipe=0, data=6, model=2): 26, ProcessCoord(pipe=0, data=6, model=3): 27, ProcessCoord(pipe=0, data=7, model=0): 28, ProcessCoord(pipe=0, data=7, model=1): 29, ProcessCoord(pipe=0, data=7, model=2): 30, ProcessCoord(pipe=0, data=7, model=3): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=0, model=1): 33, ProcessCoord(pipe=1, data=0, model=2): 34, ProcessCoord(pipe=1, data=0, model=3): 35, ProcessCoord(pipe=1, data=1, model=0): 36, ProcessCoord(pipe=1, data=1, model=1): 37, ProcessCoord(pipe=1, data=1, model=2): 38, ProcessCoord(pipe=1, data=1, model=3): 39, ProcessCoord(pipe=1, data=2, model=0): 40, ProcessCoord(pipe=1, data=2, model=1): 41, ProcessCoord(pipe=1, data=2, model=2): 42, ProcessCoord(pipe=1, data=2, model=3): 43, ProcessCoord(pipe=1, data=3, model=0): 44, ProcessCoord(pipe=1, data=3, model=1): 45, ProcessCoord(pipe=1, data=3, model=2): 46, ProcessCoord(pipe=1, data=3, model=3): 47, ProcessCoord(pipe=1, data=4, model=0): 48, ProcessCoord(pipe=1, data=4, model=1): 49, ProcessCoord(pipe=1, data=4, model=2): 50, ProcessCoord(pipe=1, data=4, model=3): 51, ProcessCoord(pipe=1, data=5, model=0): 52, ProcessCoord(pipe=1, data=5, model=1): 53, ProcessCoord(pipe=1, data=5, model=2): 54, ProcessCoord(pipe=1, data=5, model=3): 55, ProcessCoord(pipe=1, data=6, model=0): 56, ProcessCoord(pipe=1, data=6, model=1): 57, ProcessCoord(pipe=1, data=6, model=2): 58, ProcessCoord(pipe=1, data=6, model=3): 59, ProcessCoord(pipe=1, data=7, model=0): 60, ProcessCoord(pipe=1, data=7, model=1): 61, ProcessCoord(pipe=1, data=7, model=2): 62, ProcessCoord(pipe=1, data=7, model=3): 63, ProcessCoord(pipe=2, data=0, model=0): 64, ProcessCoord(pipe=2, data=0, model=1): 65, ProcessCoord(pipe=2, data=0, model=2): 66, ProcessCoord(pipe=2, data=0, model=3): 67, ProcessCoord(pipe=2, data=1, model=0): 68, ProcessCoord(pipe=2, data=1, model=1): 69, ProcessCoord(pipe=2, data=1, model=2): 70, ProcessCoord(pipe=2, data=1, model=3): 71, ProcessCoord(pipe=2, data=2, model=0): 72, ProcessCoord(pipe=2, data=2, model=1): 73, ProcessCoord(pipe=2, data=2, model=2): 74, ProcessCoord(pipe=2, data=2, model=3): 75, ProcessCoord(pipe=2, data=3, model=0): 76, ProcessCoord(pipe=2, data=3, model=1): 77, ProcessCoord(pipe=2, data=3, model=2): 78, ProcessCoord(pipe=2, data=3, model=3): 79, ProcessCoord(pipe=2, data=4, model=0): 80, ProcessCoord(pipe=2, data=4, model=1): 81, ProcessCoord(pipe=2, data=4, model=2): 82, ProcessCoord(pipe=2, data=4, model=3): 83, ProcessCoord(pipe=2, data=5, model=0): 84, ProcessCoord(pipe=2, data=5, model=1): 85, ProcessCoord(pipe=2, data=5, model=2): 86, ProcessCoord(pipe=2, data=5, model=3): 87, ProcessCoord(pipe=2, data=6, model=0): 88, ProcessCoord(pipe=2, data=6, model=1): 89, ProcessCoord(pipe=2, data=6, model=2): 90, ProcessCoord(pipe=2, data=6, model=3): 91, ProcessCoord(pipe=2, data=7, model=0): 92, ProcessCoord(pipe=2, data=7, model=1): 93, ProcessCoord(pipe=2, data=7, model=2): 94, ProcessCoord(pipe=2, data=7, model=3): 95, ProcessCoord(pipe=3, data=0, model=0): 96, ProcessCoord(pipe=3, data=0, model=1): 97, ProcessCoord(pipe=3, data=0, model=2): 98, ProcessCoord(pipe=3, data=0, model=3): 99, ProcessCoord(pipe=3, data=1, model=0): 100, ProcessCoord(pipe=3, data=1, model=1): 101, ProcessCoord(pipe=3, data=1, model=2): 102, ProcessCoord(pipe=3, data=1, model=3): 103, ProcessCoord(pipe=3, data=2, model=0): 104, ProcessCoord(pipe=3, data=2, model=1): 105, ProcessCoord(pipe=3, data=2, model=2): 106, ProcessCoord(pipe=3, data=2, model=3): 107, ProcessCoord(pipe=3, data=3, model=0): 108, ProcessCoord(pipe=3, data=3, model=1): 109, ProcessCoord(pipe=3, data=3, model=2): 110, ProcessCoord(pipe=3, data=3, model=3): 111, ProcessCoord(pipe=3, data=4, model=0): 112, ProcessCoord(pipe=3, data=4, model=1): 113, ProcessCoord(pipe=3, data=4, model=2): 114, ProcessCoord(pipe=3, data=4, model=3): 115, ProcessCoord(pipe=3, data=5, model=0): 116, ProcessCoord(pipe=3, data=5, model=1): 117, ProcessCoord(pipe=3, data=5, model=2): 118, ProcessCoord(pipe=3, data=5, model=3): 119, ProcessCoord(pipe=3, data=6, model=0): 120, ProcessCoord(pipe=3, data=6, model=1): 121, ProcessCoord(pipe=3, data=6, model=2): 122, ProcessCoord(pipe=3, data=6, model=3): 123, ProcessCoord(pipe=3, data=7, model=0): 124, ProcessCoord(pipe=3, data=7, model=1): 125, ProcessCoord(pipe=3, data=7, model=2): 126, ProcessCoord(pipe=3, data=7, model=3): 127, ProcessCoord(pipe=4, data=0, model=0): 128, ProcessCoord(pipe=4, data=0, model=1): 129, ProcessCoord(pipe=4, data=0, model=2): 130, ProcessCoord(pipe=4, data=0, model=3): 131, ProcessCoord(pipe=4, data=1, model=0): 132, ProcessCoord(pipe=4, data=1, model=1): 133, ProcessCoord(pipe=4, data=1, model=2): 134, ProcessCoord(pipe=4, data=1, model=3): 135, ProcessCoord(pipe=4, data=2, model=0): 136, ProcessCoord(pipe=4, data=2, model=1): 137, ProcessCoord(pipe=4, data=2, model=2): 138, ProcessCoord(pipe=4, data=2, model=3): 139, ProcessCoord(pipe=4, data=3, model=0): 140, ProcessCoord(pipe=4, data=3, model=1): 141, ProcessCoord(pipe=4, data=3, model=2): 142, ProcessCoord(pipe=4, data=3, model=3): 143, ProcessCoord(pipe=4, data=4, model=0): 144, ProcessCoord(pipe=4, data=4, model=1): 145, ProcessCoord(pipe=4, data=4, model=2): 146, ProcessCoord(pipe=4, data=4, model=3): 147, ProcessCoord(pipe=4, data=5, model=0): 148, ProcessCoord(pipe=4, data=5, model=1): 149, ProcessCoord(pipe=4, data=5, model=2): 150, ProcessCoord(pipe=4, data=5, model=3): 151, ProcessCoord(pipe=4, data=6, model=0): 152, ProcessCoord(pipe=4, data=6, model=1): 153, ProcessCoord(pipe=4, data=6, model=2): 154, ProcessCoord(pipe=4, data=6, model=3): 155, ProcessCoord(pipe=4, data=7, model=0): 156, ProcessCoord(pipe=4, data=7, model=1): 157, ProcessCoord(pipe=4, data=7, model=2): 158, ProcessCoord(pipe=4, data=7, model=3): 159, ProcessCoord(pipe=5, data=0, model=0): 160, ProcessCoord(pipe=5, data=0, model=1): 161, ProcessCoord(pipe=5, data=0, model=2): 162, ProcessCoord(pipe=5, data=0, model=3): 163, ProcessCoord(pipe=5, data=1, model=0): 164, ProcessCoord(pipe=5, data=1, model=1): 165, ProcessCoord(pipe=5, data=1, model=2): 166, ProcessCoord(pipe=5, data=1, model=3): 167, ProcessCoord(pipe=5, data=2, model=0): 168, ProcessCoord(pipe=5, data=2, model=1): 169, ProcessCoord(pipe=5, data=2, model=2): 170, ProcessCoord(pipe=5, data=2, model=3): 171, ProcessCoord(pipe=5, data=3, model=0): 172, ProcessCoord(pipe=5, data=3, model=1): 173, ProcessCoord(pipe=5, data=3, model=2): 174, ProcessCoord(pipe=5, data=3, model=3): 175, ProcessCoord(pipe=5, data=4, model=0): 176, ProcessCoord(pipe=5, data=4, model=1): 177, ProcessCoord(pipe=5, data=4, model=2): 178, ProcessCoord(pipe=5, data=4, model=3): 179, ProcessCoord(pipe=5, data=5, model=0): 180, ProcessCoord(pipe=5, data=5, model=1): 181, ProcessCoord(pipe=5, data=5, model=2): 182, ProcessCoord(pipe=5, data=5, model=3): 183, ProcessCoord(pipe=5, data=6, model=0): 184, ProcessCoord(pipe=5, data=6, model=1): 185, ProcessCoord(pipe=5, data=6, model=2): 186, ProcessCoord(pipe=5, data=6, model=3): 187, ProcessCoord(pipe=5, data=7, model=0): 188, ProcessCoord(pipe=5, data=7, model=1): 189, ProcessCoord(pipe=5, data=7, model=2): 190, ProcessCoord(pipe=5, data=7, model=3): 191, ProcessCoord(pipe=6, data=0, model=0): 192, ProcessCoord(pipe=6, data=0, model=1): 193, ProcessCoord(pipe=6, data=0, model=2): 194, ProcessCoord(pipe=6, data=0, model=3): 195, ProcessCoord(pipe=6, data=1, model=0): 196, ProcessCoord(pipe=6, data=1, model=1): 197, ProcessCoord(pipe=6, data=1, model=2): 198, ProcessCoord(pipe=6, data=1, model=3): 199, ProcessCoord(pipe=6, data=2, model=0): 200, ProcessCoord(pipe=6, data=2, model=1): 201, ProcessCoord(pipe=6, data=2, model=2): 202, ProcessCoord(pipe=6, data=2, model=3): 203, ProcessCoord(pipe=6, data=3, model=0): 204, ProcessCoord(pipe=6, data=3, model=1): 205, ProcessCoord(pipe=6, data=3, model=2): 206, ProcessCoord(pipe=6, data=3, model=3): 207, ProcessCoord(pipe=6, data=4, model=0): 208, ProcessCoord(pipe=6, data=4, model=1): 209, ProcessCoord(pipe=6, data=4, model=2): 210, ProcessCoord(pipe=6, data=4, model=3): 211, ProcessCoord(pipe=6, data=5, model=0): 212, ProcessCoord(pipe=6, data=5, model=1): 213, ProcessCoord(pipe=6, data=5, model=2): 214, ProcessCoord(pipe=6, data=5, model=3): 215, ProcessCoord(pipe=6, data=6, model=0): 216, ProcessCoord(pipe=6, data=6, model=1): 217, ProcessCoord(pipe=6, data=6, model=2): 218, ProcessCoord(pipe=6, data=6, model=3): 219, ProcessCoord(pipe=6, data=7, model=0): 220, ProcessCoord(pipe=6, data=7, model=1): 221, ProcessCoord(pipe=6, data=7, model=2): 222, ProcessCoord(pipe=6, data=7, model=3): 223, ProcessCoord(pipe=7, data=0, model=0): 224, ProcessCoord(pipe=7, data=0, model=1): 225, ProcessCoord(pipe=7, data=0, model=2): 226, ProcessCoord(pipe=7, data=0, model=3): 227, ProcessCoord(pipe=7, data=1, model=0): 228, ProcessCoord(pipe=7, data=1, model=1): 229, ProcessCoord(pipe=7, data=1, model=2): 230, ProcessCoord(pipe=7, data=1, model=3): 231, ProcessCoord(pipe=7, data=2, model=0): 232, ProcessCoord(pipe=7, data=2, model=1): 233, ProcessCoord(pipe=7, data=2, model=2): 234, ProcessCoord(pipe=7, data=2, model=3): 235, ProcessCoord(pipe=7, data=3, model=0): 236, ProcessCoord(pipe=7, data=3, model=1): 237, ProcessCoord(pipe=7, data=3, model=2): 238, ProcessCoord(pipe=7, data=3, model=3): 239, ProcessCoord(pipe=7, data=4, model=0): 240, ProcessCoord(pipe=7, data=4, model=1): 241, ProcessCoord(pipe=7, data=4, model=2): 242, ProcessCoord(pipe=7, data=4, model=3): 243, ProcessCoord(pipe=7, data=5, model=0): 244, ProcessCoord(pipe=7, data=5, model=1): 245, ProcessCoord(pipe=7, data=5, model=2): 246, ProcessCoord(pipe=7, data=5, model=3): 247, ProcessCoord(pipe=7, data=6, model=0): 248, ProcessCoord(pipe=7, data=6, model=1): 249, ProcessCoord(pipe=7, data=6, model=2): 250, ProcessCoord(pipe=7, data=6, model=3): 251, ProcessCoord(pipe=7, data=7, model=0): 252, ProcessCoord(pipe=7, data=7, model=1): 253, ProcessCoord(pipe=7, data=7, model=2): 254, ProcessCoord(pipe=7, data=7, model=3): 255} [2021-09-24 04:01:42,442] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=7 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe stage=1 layers=4 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe stage=2 layers=4 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=3 layers=4 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe stage=4 layers=4 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe stage=5 layers=4 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe stage=6 layers=4 27: ParallelTransformerLayerPipe 28: ParallelTransformerLayerPipe 29: ParallelTransformerLayerPipe 30: ParallelTransformerLayerPipe stage=7 layers=8 31: ParallelTransformerLayerPipe 32: ParallelTransformerLayerPipe 33: ParallelTransformerLayerPipe 34: ParallelTransformerLayerPipe 35: 36: MixedFusedLayerNorm 37: EmbeddingPipe 38: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 1986465792 [2021-09-24 04:01:43,676] [INFO] [utils.py:680:see_memory_usage] After Building Model [2021-09-24 04:01:43,677] [INFO] [utils.py:681:see_memory_usage] MA 3.77 GB Max_MA 3.79 GB CA 3.79 GB Max_CA 4 GB [2021-09-24 04:01:43,677] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 37.55 GB, percent = 20.1% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 1986465792 setting training iterations to 159576 > learning rate decay style: cosine DeepSpeed is enabled. [2021-09-24 04:01:43,733] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+bc17042, git-hash=bc17042, git-branch=big-science [2021-09-24 04:01:43,813] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-09-24 04:01:43,813] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-09-24 04:01:43,813] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer [2021-09-24 04:01:43,813] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-09-24 04:01:43,813] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-09-24 04:01:43,813] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-09-24 04:01:43,814] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-09-24 04:01:43,814] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-09-24 04:01:43,814] [INFO] [stage2.py:108:__init__] CPU Offload: False [2021-09-24 04:01:43,814] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False [2021-09-24 04:01:48,526] [INFO] [stage2.py:419:__init__] optimizer state initialized [2021-09-24 04:01:48,527] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-09-24 04:01:48,527] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-09-24 04:01:48,527] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-09-24 04:01:48,527] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-09-24 04:01:48,527] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-09-24 04:01:48,527] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-09-24 04:01:48,527] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-09-24 04:01:48,527] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-09-24 04:01:48,527] [INFO] [config.py:904:print] amp_enabled .................. False [2021-09-24 04:01:48,527] [INFO] [config.py:904:print] amp_params ................... False [2021-09-24 04:01:48,527] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-09-24 04:01:48,527] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-09-24 04:01:48,527] [INFO] [config.py:904:print] disable_allgather ............ False [2021-09-24 04:01:48,527] [INFO] [config.py:904:print] dump_state ................... False [2021-09-24 04:01:48,527] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-09-24 04:01:48,527] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] global_rank .................. 0 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] gradient_accumulation_steps .. 256 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] gradient_clipping ............ 1.0 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4096 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] optimizer_name ............... None [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] optimizer_params ............. None [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] pld_enabled .................. False [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] pld_params ................... False [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] scheduler_name ............... None [2021-09-24 04:01:48,528] [INFO] [config.py:904:print] scheduler_params ............. None [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] sparse_attention ............. None [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] steps_per_print .............. 2000 [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] train_batch_size ............. 2048 [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 1 [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] world_size ................... 8 [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] zero_allow_untested_optimizer False [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] zero_config .................. { "stage": 1, "contiguous_gradients": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] zero_enabled ................. True [2021-09-24 04:01:48,529] [INFO] [config.py:904:print] zero_optimization_stage ...... 1 [2021-09-24 04:01:48,529] [INFO] [config.py:906:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 2.048000e+03, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-09-24 04:01:48,529] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=256 micro_batch_size=1 [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=67 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=64 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=66 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=130 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=129 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=131 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=128 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=193 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=194 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=195 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=65 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=226 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=225 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=227 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=224 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=99 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=96 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=97 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=35 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=33 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=32 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=34 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=163 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=161 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=160 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=162 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=192 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 04:01:48,959] [INFO] [engine.py:134:__init__] RANK=98 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) > using checkpoint value 6e-05 for learning rate > using checkpoint value 6e-06 for minimum learning rate > using checkpoint value 216320 for warmup iterations > using checkpoint value 126953125 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 8 ZeRO state_dicts for rank 124 successfully loaded 8 ZeRO state_dicts for rank 115 successfully loaded 8 ZeRO state_dicts for rank 60 successfully loaded 8 ZeRO state_dicts for rank 48 successfully loaded 8 ZeRO state_dicts for rank 61 successfully loaded 8 ZeRO state_dicts for rank 125 successfully loaded 8 ZeRO state_dicts for rank 126 successfully loaded 8 ZeRO state_dicts for rank 127 successfully loaded 8 ZeRO state_dicts for rank 160 successfully loaded 8 ZeRO state_dicts for rank 135 successfully loaded 8 ZeRO state_dicts for rank 68 successfully loaded 8 ZeRO state_dicts for rank 113 successfully loaded 8 ZeRO state_dicts for rank 108 successfully loaded 8 ZeRO state_dicts for rank 27 successfully loaded 8 ZeRO state_dicts for rank 72 successfully loaded 8 ZeRO state_dicts for rank 49 successfully loaded 8 ZeRO state_dicts for rank 71 successfully loaded 8 ZeRO state_dicts for rank 147 successfully loaded 8 ZeRO state_dicts for rank 96 successfully loaded 8 ZeRO state_dicts for rank 32 successfully loaded 8 ZeRO state_dicts for rank 214 successfully loaded 8 ZeRO state_dicts for rank 143 successfully loaded 8 ZeRO state_dicts for rank 158 successfully loaded 8 ZeRO state_dicts for rank 132 successfully loaded 8 ZeRO state_dicts for rank 111 successfully loaded 8 ZeRO state_dicts for rank 155 successfully loaded 8 ZeRO state_dicts for rank 112 successfully loaded 8 ZeRO state_dicts for rank 76 successfully loaded 8 ZeRO state_dicts for rank 63 successfully loaded 8 ZeRO state_dicts for rank 44 successfully loaded 8 ZeRO state_dicts for rank 201 successfully loaded 8 ZeRO state_dicts for rank 213 successfully loaded 8 ZeRO state_dicts for rank 162 successfully loaded 8 ZeRO state_dicts for rank 97 successfully loaded 8 ZeRO state_dicts for rank 51 successfully loaded 8 ZeRO state_dicts for rank 133 loading 8 zero partition checkpoints for rank 124 successfully loaded 8 ZeRO state_dicts for rank 114 successfully loaded 8 ZeRO state_dicts for rank 33 successfully loaded 8 ZeRO state_dicts for rank 140 successfully loaded 8 ZeRO state_dicts for rank 181 successfully loaded 8 ZeRO state_dicts for rank 41 successfully loaded 8 ZeRO state_dicts for rank 185 successfully loaded 8 ZeRO state_dicts for rank 241 successfully loaded 8 ZeRO state_dicts for rank 134 successfully loaded 8 ZeRO state_dicts for rank 39 successfully loaded 8 ZeRO state_dicts for rank 24 successfully loaded 8 ZeRO state_dicts for rank 212 successfully loaded 8 ZeRO state_dicts for rank 104 successfully loaded 8 ZeRO state_dicts for rank 142 successfully loaded 8 ZeRO state_dicts for rank 154 successfully loaded 8 ZeRO state_dicts for rank 159 successfully loaded 8 ZeRO state_dicts for rank 166 successfully loaded 8 ZeRO state_dicts for rank 148 successfully loaded 8 ZeRO state_dicts for rank 35 successfully loaded 8 ZeRO state_dicts for rank 70 successfully loaded 8 ZeRO state_dicts for rank 75 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-24 04:02:16 CEST)" was missed by 0:00:03.600668 successfully loaded 8 ZeRO state_dicts for rank 156 successfully loaded 8 ZeRO state_dicts for rank 161 successfully loaded 8 ZeRO state_dicts for rank 243 successfully loaded 8 ZeRO state_dicts for rank 40 successfully loaded 8 ZeRO state_dicts for rank 141 successfully loaded 8 ZeRO state_dicts for rank 98 successfully loaded 8 ZeRO state_dicts for rank 210 successfully loaded 8 ZeRO state_dicts for rank 52 successfully loaded 8 ZeRO state_dicts for rank 28 successfully loaded 8 ZeRO state_dicts for rank 110 successfully loaded 8 ZeRO state_dicts for rank 139 successfully loaded 8 ZeRO state_dicts for rank 36 successfully loaded 8 ZeRO state_dicts for rank 168 successfully loaded 8 ZeRO state_dicts for rank 26 successfully loaded 8 ZeRO state_dicts for rank 84 successfully loaded 8 ZeRO state_dicts for rank 208 successfully loaded 8 ZeRO state_dicts for rank 190 successfully loaded 8 ZeRO state_dicts for rank 92 loading 8 zero partition checkpoints for rank 115 successfully loaded 8 ZeRO state_dicts for rank 34 successfully loaded 8 ZeRO state_dicts for rank 171 successfully loaded 8 ZeRO state_dicts for rank 152 successfully loaded 8 ZeRO state_dicts for rank 73 successfully loaded 8 ZeRO state_dicts for rank 47 successfully loaded 8 ZeRO state_dicts for rank 62 successfully loaded 8 ZeRO state_dicts for rank 150 successfully loaded 8 ZeRO state_dicts for rank 69 successfully loaded 8 ZeRO state_dicts for rank 157 successfully loaded 8 ZeRO state_dicts for rank 182 successfully loaded 8 ZeRO state_dicts for rank 145 successfully loaded 8 ZeRO state_dicts for rank 79 successfully loaded 8 ZeRO state_dicts for rank 88 successfully loaded 8 ZeRO state_dicts for rank 109 successfully loaded 8 ZeRO state_dicts for rank 56 successfully loaded 8 ZeRO state_dicts for rank 149 successfully loaded 8 ZeRO state_dicts for rank 50 successfully loaded 8 ZeRO state_dicts for rank 42 successfully loaded 8 ZeRO state_dicts for rank 206 successfully loaded 8 ZeRO state_dicts for rank 196 successfully loaded 8 ZeRO state_dicts for rank 80 successfully loaded 8 ZeRO state_dicts for rank 215 successfully loaded 8 ZeRO state_dicts for rank 74 successfully loaded 8 ZeRO state_dicts for rank 43 successfully loaded 8 ZeRO state_dicts for rank 99 successfully loaded 8 ZeRO state_dicts for rank 192 successfully loaded 8 ZeRO state_dicts for rank 78 successfully loaded 8 ZeRO state_dicts for rank 37 successfully loaded 8 ZeRO state_dicts for rank 216 successfully loaded 8 ZeRO state_dicts for rank 153 successfully loaded 8 ZeRO state_dicts for rank 77 loading 8 zero partition checkpoints for rank 126 loading 8 zero partition checkpoints for rank 125 successfully loaded 8 ZeRO state_dicts for rank 193 successfully loaded 8 ZeRO state_dicts for rank 151 successfully loaded 8 ZeRO state_dicts for rank 59 successfully loaded 8 ZeRO state_dicts for rank 180 successfully loaded 8 ZeRO state_dicts for rank 220 successfully loaded 8 ZeRO state_dicts for rank 100 successfully loaded 8 ZeRO state_dicts for rank 107 successfully loaded 8 ZeRO state_dicts for rank 90 successfully loaded 8 ZeRO state_dicts for rank 130 successfully loaded 8 ZeRO state_dicts for rank 163 successfully loaded 8 ZeRO state_dicts for rank 164 successfully loaded 8 ZeRO state_dicts for rank 205 successfully loaded 8 ZeRO state_dicts for rank 94 successfully loaded 8 ZeRO state_dicts for rank 144 successfully loaded 8 ZeRO state_dicts for rank 225 successfully loaded 8 ZeRO state_dicts for rank 25 successfully loaded 8 ZeRO state_dicts for rank 217 successfully loaded 8 ZeRO state_dicts for rank 184 successfully loaded 8 ZeRO state_dicts for rank 172 successfully loaded 8 ZeRO state_dicts for rank 128 successfully loaded 8 ZeRO state_dicts for rank 15 successfully loaded 8 ZeRO state_dicts for rank 131 successfully loaded 8 ZeRO state_dicts for rank 46 successfully loaded 8 ZeRO state_dicts for rank 170 successfully loaded 8 ZeRO state_dicts for rank 198 successfully loaded 8 ZeRO state_dicts for rank 58 successfully loaded 8 ZeRO state_dicts for rank 248 successfully loaded 8 ZeRO state_dicts for rank 13 loading 8 zero partition checkpoints for rank 127 successfully loaded 8 ZeRO state_dicts for rank 183 successfully loaded 8 ZeRO state_dicts for rank 64 successfully loaded 8 ZeRO state_dicts for rank 105 successfully loaded 8 ZeRO state_dicts for rank 55 successfully loaded 8 ZeRO state_dicts for rank 66 successfully loaded 8 ZeRO state_dicts for rank 14 successfully loaded 8 ZeRO state_dicts for rank 240 successfully loaded 8 ZeRO state_dicts for rank 81 successfully loaded 8 ZeRO state_dicts for rank 186 successfully loaded 8 ZeRO state_dicts for rank 65 successfully loaded 8 ZeRO state_dicts for rank 146 successfully loaded 8 ZeRO state_dicts for rank 93 successfully loaded 8 ZeRO state_dicts for rank 200 successfully loaded 8 ZeRO state_dicts for rank 138 successfully loaded 8 ZeRO state_dicts for rank 211 successfully loaded 8 ZeRO state_dicts for rank 45 successfully loaded 8 ZeRO state_dicts for rank 38 successfully loaded 8 ZeRO state_dicts for rank 229 successfully loaded 8 ZeRO state_dicts for rank 129 successfully loaded 8 ZeRO state_dicts for rank 31 successfully loaded 8 ZeRO state_dicts for rank 197 successfully loaded 8 ZeRO state_dicts for rank 177 successfully loaded 8 ZeRO state_dicts for rank 116 successfully loaded 8 ZeRO state_dicts for rank 89 successfully loaded 8 ZeRO state_dicts for rank 117 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-24 04:02:20 CEST)" was missed by 0:00:03.124446 successfully loaded 8 ZeRO state_dicts for rank 23 successfully loaded 8 ZeRO state_dicts for rank 188 successfully loaded 8 ZeRO state_dicts for rank 137 successfully loaded 8 ZeRO state_dicts for rank 4 successfully loaded 8 ZeRO state_dicts for rank 167 successfully loaded 8 ZeRO state_dicts for rank 236 loading 8 zero partition checkpoints for rank 61 successfully loaded 8 ZeRO state_dicts for rank 207 successfully loaded 8 ZeRO state_dicts for rank 203 successfully loaded 8 ZeRO state_dicts for rank 176 successfully loaded 8 ZeRO state_dicts for rank 174 successfully loaded 8 ZeRO state_dicts for rank 202 successfully loaded 8 ZeRO state_dicts for rank 82 successfully loaded 8 ZeRO state_dicts for rank 169 loading 8 zero partition checkpoints for rank 48 successfully loaded 8 ZeRO state_dicts for rank 209 successfully loaded 8 ZeRO state_dicts for rank 106 successfully loaded 8 ZeRO state_dicts for rank 195 successfully loaded 8 ZeRO state_dicts for rank 136 successfully loaded 8 ZeRO state_dicts for rank 8 successfully loaded 8 ZeRO state_dicts for rank 178 successfully loaded 8 ZeRO state_dicts for rank 219 successfully loaded 8 ZeRO state_dicts for rank 204 successfully loaded 8 ZeRO state_dicts for rank 53 successfully loaded 8 ZeRO state_dicts for rank 235 successfully loaded 8 ZeRO state_dicts for rank 191 loading 8 zero partition checkpoints for rank 60 successfully loaded 8 ZeRO state_dicts for rank 227 successfully loaded 8 ZeRO state_dicts for rank 120 successfully loaded 8 ZeRO state_dicts for rank 175 successfully loaded 8 ZeRO state_dicts for rank 250 successfully loaded 8 ZeRO state_dicts for rank 189 successfully loaded 8 ZeRO state_dicts for rank 6 successfully loaded 8 ZeRO state_dicts for rank 237 successfully loaded 8 ZeRO state_dicts for rank 118 successfully loaded 8 ZeRO state_dicts for rank 119 loading 8 zero partition checkpoints for rank 68 successfully loaded 8 ZeRO state_dicts for rank 22 successfully loaded 8 ZeRO state_dicts for rank 91 successfully loaded 8 ZeRO state_dicts for rank 86 successfully loaded 8 ZeRO state_dicts for rank 83 successfully loaded 8 ZeRO state_dicts for rank 87 successfully loaded 8 ZeRO state_dicts for rank 121 successfully loaded 8 ZeRO state_dicts for rank 218 successfully loaded 8 ZeRO state_dicts for rank 221 loading 8 zero partition checkpoints for rank 113 successfully loaded 8 ZeRO state_dicts for rank 9 successfully loaded 8 ZeRO state_dicts for rank 222 successfully loaded 8 ZeRO state_dicts for rank 251 loading 8 zero partition checkpoints for rank 72 successfully loaded 8 ZeRO state_dicts for rank 179 successfully loaded 8 ZeRO state_dicts for rank 247 successfully loaded 8 ZeRO state_dicts for rank 12 successfully loaded 8 ZeRO state_dicts for rank 29 successfully loaded 8 ZeRO state_dicts for rank 95 successfully loaded 8 ZeRO state_dicts for rank 231 successfully loaded 8 ZeRO state_dicts for rank 239 successfully loaded 8 ZeRO state_dicts for rank 245 loading 8 zero partition checkpoints for rank 32 successfully loaded 8 ZeRO state_dicts for rank 255 successfully loaded 8 ZeRO state_dicts for rank 232 successfully loaded 8 ZeRO state_dicts for rank 238 successfully loaded 8 ZeRO state_dicts for rank 7 successfully loaded 8 ZeRO state_dicts for rank 228 successfully loaded 8 ZeRO state_dicts for rank 67 successfully loaded 8 ZeRO state_dicts for rank 252 successfully loaded 8 ZeRO state_dicts for rank 187 successfully loaded 8 ZeRO state_dicts for rank 230 successfully loaded 8 ZeRO state_dicts for rank 244 successfully loaded 8 ZeRO state_dicts for rank 194 loading 8 zero partition checkpoints for rank 112 loading 8 zero partition checkpoints for rank 135 successfully loaded 8 ZeRO state_dicts for rank 5 successfully loaded 8 ZeRO state_dicts for rank 103 loading 8 zero partition checkpoints for rank 111 successfully loaded 8 ZeRO state_dicts for rank 21 loading 8 zero partition checkpoints for rank 63 successfully loaded 8 ZeRO state_dicts for rank 165 successfully loaded 8 ZeRO state_dicts for rank 54 successfully loaded 8 ZeRO state_dicts for rank 102 successfully loaded 8 ZeRO state_dicts for rank 233 successfully loaded 8 ZeRO state_dicts for rank 85 successfully loaded 8 ZeRO state_dicts for rank 223 successfully loaded 8 ZeRO state_dicts for rank 11 successfully loaded 8 ZeRO state_dicts for rank 226 successfully loaded 8 ZeRO state_dicts for rank 101 loading 8 zero partition checkpoints for rank 160 loading 8 zero partition checkpoints for rank 143 loading 8 zero partition checkpoints for rank 155 successfully loaded 8 ZeRO state_dicts for rank 199 successfully loaded 8 ZeRO state_dicts for rank 1 successfully loaded 8 ZeRO state_dicts for rank 173 successfully loaded 8 ZeRO state_dicts for rank 20 loading 8 zero partition checkpoints for rank 162 loading 8 zero partition checkpoints for rank 76 successfully loaded 8 ZeRO state_dicts for rank 246 successfully loaded 8 ZeRO state_dicts for rank 242 successfully loaded 8 ZeRO state_dicts for rank 254 successfully loaded 8 ZeRO state_dicts for rank 0 successfully loaded 8 ZeRO state_dicts for rank 253 successfully loaded 8 ZeRO state_dicts for rank 2 loading 8 zero partition checkpoints for rank 27 loading 8 zero partition checkpoints for rank 201 loading 8 zero partition checkpoints for rank 33 successfully loaded 8 ZeRO state_dicts for rank 224 loading 8 zero partition checkpoints for rank 185 loading 8 zero partition checkpoints for rank 212 successfully loaded 8 ZeRO state_dicts for rank 122 loading 8 zero partition checkpoints for rank 214 loading 8 zero partition checkpoints for rank 181 loading 8 zero partition checkpoints for rank 114 loading 8 zero partition checkpoints for rank 39 loading 8 zero partition checkpoints for rank 154 successfully loaded 8 ZeRO state_dicts for rank 10 loading 8 zero partition checkpoints for rank 132 successfully loaded 8 ZeRO state_dicts for rank 249 loading 8 zero partition checkpoints for rank 147 successfully loaded 8 ZeRO state_dicts for rank 123 successfully loaded 8 ZeRO state_dicts for rank 57 loading 8 zero partition checkpoints for rank 213 loading 8 zero partition checkpoints for rank 133 loading 8 zero partition checkpoints for rank 35 loading 8 zero partition checkpoints for rank 41 loading 8 zero partition checkpoints for rank 156 successfully loaded 8 ZeRO state_dicts for rank 3 loading 8 zero partition checkpoints for rank 75 loading 8 zero partition checkpoints for rank 148 loading 8 zero partition checkpoints for rank 104 loading 8 zero partition checkpoints for rank 142 successfully loaded 8 ZeRO state_dicts for rank 234 loading 8 zero partition checkpoints for rank 210 loading 8 zero partition checkpoints for rank 52 loading 8 zero partition checkpoints for rank 134 loading 8 zero partition checkpoints for rank 70 loading 8 zero partition checkpoints for rank 139 successfully loaded 8 ZeRO state_dicts for rank 30 loading 8 zero partition checkpoints for rank 161 loading 8 zero partition checkpoints for rank 190 loading 8 zero partition checkpoints for rank 51 loading 8 zero partition checkpoints for rank 168 loading 8 zero partition checkpoints for rank 158 loading 8 zero partition checkpoints for rank 208 loading 8 zero partition checkpoints for rank 97 loading 8 zero partition checkpoints for rank 73 loading 8 zero partition checkpoints for rank 152 loading 8 zero partition checkpoints for rank 34 loading 8 zero partition checkpoints for rank 79 loading 8 zero partition checkpoints for rank 108 loading 8 zero partition checkpoints for rank 241 loading 8 zero partition checkpoints for rank 26 loading 8 zero partition checkpoints for rank 88 loading 8 zero partition checkpoints for rank 109 loading 8 zero partition checkpoints for rank 157 loading 8 zero partition checkpoints for rank 40 loading 8 zero partition checkpoints for rank 28 loading 8 zero partition checkpoints for rank 36 loading 8 zero partition checkpoints for rank 215 loading 8 zero partition checkpoints for rank 43 loading 8 zero partition checkpoints for rank 80 loading 8 zero partition checkpoints for rank 47 loading 8 zero partition checkpoints for rank 192 loading 8 zero partition checkpoints for rank 78 loading 8 zero partition checkpoints for rank 150 loading 8 zero partition checkpoints for rank 153 loading 8 zero partition checkpoints for rank 171 loading 8 zero partition checkpoints for rank 182 loading 8 zero partition checkpoints for rank 151 loading 8 zero partition checkpoints for rank 140 loading 8 zero partition checkpoints for rank 159 loading 8 zero partition checkpoints for rank 149 loading 8 zero partition checkpoints for rank 74 loading 8 zero partition checkpoints for rank 77 loading 8 zero partition checkpoints for rank 71 loading 8 zero partition checkpoints for rank 141 loading 8 zero partition checkpoints for rank 98 loading 8 zero partition checkpoints for rank 128 loading 8 zero partition checkpoints for rank 206 loading 8 zero partition checkpoints for rank 164 loading 8 zero partition checkpoints for rank 144 loading 8 zero partition checkpoints for rank 62 loading 8 zero partition checkpoints for rank 198 loading 8 zero partition checkpoints for rank 170 loading 8 zero partition checkpoints for rank 180 loading 8 zero partition checkpoints for rank 130 loading 8 zero partition checkpoints for rank 216 loading 8 zero partition checkpoints for rank 100 loading 8 zero partition checkpoints for rank 183 loading 8 zero partition checkpoints for rank 38 loading 8 zero partition checkpoints for rank 205 loading 8 zero partition checkpoints for rank 163 loading 8 zero partition checkpoints for rank 138 loading 8 zero partition checkpoints for rank 184 loading 8 zero partition checkpoints for rank 64 loading 8 zero partition checkpoints for rank 145 loading 8 zero partition checkpoints for rank 211 loading 8 zero partition checkpoints for rank 186 loading 8 zero partition checkpoints for rank 217 loading 8 zero partition checkpoints for rank 81 loading 8 zero partition checkpoints for rank 146 loading 8 zero partition checkpoints for rank 96 loading 8 zero partition checkpoints for rank 137 loading 8 zero partition checkpoints for rank 42 loading 8 zero partition checkpoints for rank 37 loading 8 zero partition checkpoints for rank 44 loading 8 zero partition checkpoints for rank 203 loading 8 zero partition checkpoints for rank 89 loading 8 zero partition checkpoints for rank 69 loading 8 zero partition checkpoints for rank 167 loading 8 zero partition checkpoints for rank 225 loading 8 zero partition checkpoints for rank 219 loading 8 zero partition checkpoints for rank 117 loading 8 zero partition checkpoints for rank 136 loading 8 zero partition checkpoints for rank 209 loading 8 zero partition checkpoints for rank 65 loading 8 zero partition checkpoints for rank 45 loading 8 zero partition checkpoints for rank 202 loading 8 zero partition checkpoints for rank 166 loading 8 zero partition checkpoints for rank 106 loading 8 zero partition checkpoints for rank 13 loading 8 zero partition checkpoints for rank 196 loading 8 zero partition checkpoints for rank 178 loading 8 zero partition checkpoints for rank 107 loading 8 zero partition checkpoints for rank 200 loading 8 zero partition checkpoints for rank 189 loading 8 zero partition checkpoints for rank 92 loading 8 zero partition checkpoints for rank 110 loading 8 zero partition checkpoints for rank 82 loading 8 zero partition checkpoints for rank 86 loading 8 zero partition checkpoints for rank 4 loading 8 zero partition checkpoints for rank 240 loading 8 zero partition checkpoints for rank 83 loading 8 zero partition checkpoints for rank 56 loading 8 zero partition checkpoints for rank 118 loading 8 zero partition checkpoints for rank 176 loading 8 zero partition checkpoints for rank 105 loading 8 zero partition checkpoints for rank 177 loading 8 zero partition checkpoints for rank 221 loading 8 zero partition checkpoints for rank 222 loading 8 zero partition checkpoints for rank 218 loading 8 zero partition checkpoints for rank 49 loading 8 zero partition checkpoints for rank 169 loading 8 zero partition checkpoints for rank 194 loading 8 zero partition checkpoints for rank 54 loading 8 zero partition checkpoints for rank 250 loading 8 zero partition checkpoints for rank 103 loading 8 zero partition checkpoints for rank 199 loading 8 zero partition checkpoints for rank 187 loading 8 zero partition checkpoints for rank 12 loading 8 zero partition checkpoints for rank 179 loading 8 zero partition checkpoints for rank 29 loading 8 zero partition checkpoints for rank 55 loading 8 zero partition checkpoints for rank 197 loading 8 zero partition checkpoints for rank 24 loading 8 zero partition checkpoints for rank 85 loading 8 zero partition checkpoints for rank 58 loading 8 zero partition checkpoints for rank 22 loading 8 zero partition checkpoints for rank 131 loading 8 zero partition checkpoints for rank 229 loading 8 zero partition checkpoints for rank 99 loading 8 zero partition checkpoints for rank 90 loading 8 zero partition checkpoints for rank 232 loading 8 zero partition checkpoints for rank 193 loading 8 zero partition checkpoints for rank 239 loading 8 zero partition checkpoints for rank 23 loading 8 zero partition checkpoints for rank 94 loading 8 zero partition checkpoints for rank 236 loading 8 zero partition checkpoints for rank 129 loading 8 zero partition checkpoints for rank 251 loading 8 zero partition checkpoints for rank 46 loading 8 zero partition checkpoints for rank 21 loading 8 zero partition checkpoints for rank 252 loading 8 zero partition checkpoints for rank 238 loading 8 zero partition checkpoints for rank 7 loading 8 zero partition checkpoints for rank 53 loading 8 zero partition checkpoints for rank 84 loading 8 zero partition checkpoints for rank 254 loading 8 zero partition checkpoints for rank 6 loading 8 zero partition checkpoints for rank 245 loading 8 zero partition checkpoints for rank 246 loading 8 zero partition checkpoints for rank 243 loading 8 zero partition checkpoints for rank 233 loading 8 zero partition checkpoints for rank 1 loading 8 zero partition checkpoints for rank 50 loading 8 zero partition checkpoints for rank 220 loading 8 zero partition checkpoints for rank 195 loading 8 zero partition checkpoints for rank 237 loading 8 zero partition checkpoints for rank 165 loading 8 zero partition checkpoints for rank 230 loading 8 zero partition checkpoints for rank 224 loading 8 zero partition checkpoints for rank 207 loading 8 zero partition checkpoints for rank 2 loading 8 zero partition checkpoints for rank 66 loading 8 zero partition checkpoints for rank 204 loading 8 zero partition checkpoints for rank 59 loading 8 zero partition checkpoints for rank 25 loading 8 zero partition checkpoints for rank 5 loading 8 zero partition checkpoints for rank 228 loading 8 zero partition checkpoints for rank 91 loading 8 zero partition checkpoints for rank 231 loading 8 zero partition checkpoints for rank 116 loading 8 zero partition checkpoints for rank 102 loading 8 zero partition checkpoints for rank 20 loading 8 zero partition checkpoints for rank 119 loading 8 zero partition checkpoints for rank 101 loading 8 zero partition checkpoints for rank 67 loading 8 zero partition checkpoints for rank 93 loading 8 zero partition checkpoints for rank 242 loading 8 zero partition checkpoints for rank 188 loading 8 zero partition checkpoints for rank 87 loading 8 zero partition checkpoints for rank 247 loading 8 zero partition checkpoints for rank 0 loading 8 zero partition checkpoints for rank 244 checkpoint version 3.0 loading 8 zero partition checkpoints for rank 223 loading 8 zero partition checkpoints for rank 191 loading 8 zero partition checkpoints for rank 31 loading 8 zero partition checkpoints for rank 57 loading 8 zero partition checkpoints for rank 95 loading 8 zero partition checkpoints for rank 15 loading 8 zero partition checkpoints for rank 248 loading 8 zero partition checkpoints for rank 120 loading 8 zero partition checkpoints for rank 14 loading 8 zero partition checkpoints for rank 235 loading 8 zero partition checkpoints for rank 3 loading 8 zero partition checkpoints for rank 121 loading 8 zero partition checkpoints for rank 255 loading 8 zero partition checkpoints for rank 172 loading 8 zero partition checkpoints for rank 253 loading 8 zero partition checkpoints for rank 227 loading 8 zero partition checkpoints for rank 249 loading 8 zero partition checkpoints for rank 30 loading 8 zero partition checkpoints for rank 174 loading 8 zero partition checkpoints for rank 226 loading 8 zero partition checkpoints for rank 234 loading 8 zero partition checkpoints for rank 175 loading 8 zero partition checkpoints for rank 173 loading 8 zero partition checkpoints for rank 122 loading 8 zero partition checkpoints for rank 123 loading 8 zero partition checkpoints for rank 8 loading 8 zero partition checkpoints for rank 9 loading 8 zero partition checkpoints for rank 11 loading 8 zero partition checkpoints for rank 10 successfully loaded 8 ZeRO state_dicts for rank 18 successfully loaded 8 ZeRO state_dicts for rank 16 successfully loaded 8 ZeRO state_dicts for rank 17 loading 8 zero partition checkpoints for rank 18 successfully loaded 8 ZeRO state_dicts for rank 19 loading 8 zero partition checkpoints for rank 16 loading 8 zero partition checkpoints for rank 17 loading 8 zero partition checkpoints for rank 19 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints at iteration 474 time (ms) | load-checkpoint: 86577.34 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-24 04:03:15 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 300000000 validation: 1638400 test: 10240 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.164226 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.365 seconds total number of samples: 394611670 total number of epochs: 3 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.203 seconds total number of samples: 6927161 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.072 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-09-24 04:03:22 done with setup ... training ... time (ms) | model-and-optimizer-setup: 94922.27 | train/valid/test-data-iterators-setup: 5644.20 [before the start of training step] datetime: 2021-09-24 04:03:22 [2021-09-24 04:03:22,280] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information [2021-09-24 04:03:22,280] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-09-24 04:03:22,281] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 32 total layers [2021-09-24 04:03:22,281] [INFO] [checkpointing.py:415:forward] ----Synchronization False [2021-09-24 04:03:22,281] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False [2021-09-24 04:03:47] PULSE: tr8-104B is waiting to be scheduled (1159457_[1-10%1] on 'gpu_p13' partition) [2021-09-24 04:03:47] PULSE: tr8-104B is scheduled to start in 18:10:24 (at 2021-09-24T22:14:12) (1161605 on 'gpu_p13' partition) [2021-09-24 04:03:47] PULSE: tr8-104B is running for 2:42 since 2021-09-24T04:01:05 (1162747 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) [Rank 33] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 65] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18826.0 | max reserved: 18826.0 [Rank 1] (after 475 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 21150.0 | max reserved: 21150.0 [Rank 225] (after 475 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 22108.0 | max reserved: 22108.0 [Rank 97] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 129] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0 [Rank 193] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18778.0 | max reserved: 18778.0 [Rank 161] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 2] (after 475 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 22878.0 | max reserved: 22878.0 [Rank 226] (after 475 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 20752.0 | max reserved: 20752.0 [Rank 34] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0 [Rank 66] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0 [Rank 98] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18522.0 | max reserved: 18522.0 [Rank 130] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0 [Rank 194] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 162] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 0] (after 475 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 23514.0 | max reserved: 23514.0 [Rank 224] (after 475 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 22108.0 | max reserved: 22108.0 [Rank 32] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 19012.0 | max reserved: 19012.0 [Rank 64] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 19012.0 | max reserved: 19012.0 [Rank 96] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 19012.0 | max reserved: 19012.0 [Rank 192] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18884.0 | max reserved: 18884.0 [Rank 128] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18884.0 | max reserved: 18884.0 [Rank 160] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18868.0 | max reserved: 18868.0 [Rank 3] (after 475 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 22890.0 | max reserved: 22890.0 [Rank 35] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 227] (after 475 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 20752.0 | max reserved: 20752.0 [Rank 67] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0 [Rank 99] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 131] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18522.0 | max reserved: 18522.0 [Rank 195] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 163] (after 475 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 iteration 475/ 159576 | consumed samples: 7600 | elapsed time per iteration (ms): 29962.7 | learning rate: 2.108E-06 | global batch size: 16 | lm loss: 7.833103E+00 | loss scale: 4096.0 | grad norm: 47969.708 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 476/ 159576 | consumed samples: 7616 | elapsed time per iteration (ms): 13562.3 | learning rate: 2.112E-06 | global batch size: 16 | lm loss: 7.715385E+00 | loss scale: 4096.0 | grad norm: 28643.174 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 477/ 159576 | consumed samples: 7632 | elapsed time per iteration (ms): 14532.6 | learning rate: 2.117E-06 | global batch size: 16 | lm loss: 7.912835E+00 | loss scale: 4096.0 | grad norm: 18978.073 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 478/ 159576 | consumed samples: 7648 | elapsed time per iteration (ms): 13659.0 | learning rate: 2.121E-06 | global batch size: 16 | lm loss: 7.845491E+00 | loss scale: 4096.0 | grad norm: 29417.161 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 479/ 159576 | consumed samples: 7664 | elapsed time per iteration (ms): 13928.5 | learning rate: 2.126E-06 | global batch size: 16 | lm loss: 7.818515E+00 | loss scale: 4096.0 | grad norm: 24185.570 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 480/ 159576 | consumed samples: 7680 | elapsed time per iteration (ms): 13863.2 | learning rate: 2.130E-06 | global batch size: 16 | lm loss: 7.759526E+00 | loss scale: 4096.0 | grad norm: 18058.893 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 481/ 159576 | consumed samples: 7696 | elapsed time per iteration (ms): 13613.0 | learning rate: 2.135E-06 | global batch size: 16 | lm loss: 7.666837E+00 | loss scale: 4096.0 | grad norm: 21581.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 482/ 159576 | consumed samples: 7712 | elapsed time per iteration (ms): 13350.8 | learning rate: 2.139E-06 | global batch size: 16 | lm loss: 7.929407E+00 | loss scale: 4096.0 | grad norm: 22311.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 483/ 159576 | consumed samples: 7728 | elapsed time per iteration (ms): 13819.2 | learning rate: 2.143E-06 | global batch size: 16 | lm loss: 7.786575E+00 | loss scale: 4096.0 | grad norm: 23821.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 484/ 159576 | consumed samples: 7744 | elapsed time per iteration (ms): 13697.3 | learning rate: 2.148E-06 | global batch size: 16 | lm loss: 7.834505E+00 | loss scale: 4096.0 | grad norm: 18706.902 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 485/ 159576 | consumed samples: 7760 | elapsed time per iteration (ms): 13285.4 | learning rate: 2.152E-06 | global batch size: 16 | lm loss: 7.796403E+00 | loss scale: 4096.0 | grad norm: 23055.088 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 486/ 159576 | consumed samples: 7776 | elapsed time per iteration (ms): 13893.0 | learning rate: 2.157E-06 | global batch size: 16 | lm loss: 7.853868E+00 | loss scale: 4096.0 | grad norm: 16300.893 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 487/ 159576 | consumed samples: 7792 | elapsed time per iteration (ms): 14059.7 | learning rate: 2.161E-06 | global batch size: 16 | lm loss: 7.943846E+00 | loss scale: 4096.0 | grad norm: 18420.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 488/ 159576 | consumed samples: 7808 | elapsed time per iteration (ms): 13994.0 | learning rate: 2.166E-06 | global batch size: 16 | lm loss: 7.850654E+00 | loss scale: 4096.0 | grad norm: 17235.839 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 489/ 159576 | consumed samples: 7824 | elapsed time per iteration (ms): 13596.2 | learning rate: 2.170E-06 | global batch size: 16 | lm loss: 7.825228E+00 | loss scale: 4096.0 | grad norm: 16217.059 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 490/ 159576 | consumed samples: 7840 | elapsed time per iteration (ms): 14562.4 | learning rate: 2.175E-06 | global batch size: 16 | lm loss: 7.944909E+00 | loss scale: 4096.0 | grad norm: 20367.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 491/ 159576 | consumed samples: 7856 | elapsed time per iteration (ms): 13373.8 | learning rate: 2.179E-06 | global batch size: 16 | lm loss: 7.772738E+00 | loss scale: 4096.0 | grad norm: 14868.924 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 492/ 159576 | consumed samples: 7872 | elapsed time per iteration (ms): 13407.0 | learning rate: 2.183E-06 | global batch size: 16 | lm loss: 7.807293E+00 | loss scale: 4096.0 | grad norm: 12933.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 493/ 159576 | consumed samples: 7888 | elapsed time per iteration (ms): 13535.9 | learning rate: 2.188E-06 | global batch size: 16 | lm loss: 7.796512E+00 | loss scale: 4096.0 | grad norm: 14067.056 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 494/ 159576 | consumed samples: 7904 | elapsed time per iteration (ms): 13629.5 | learning rate: 2.192E-06 | global batch size: 16 | lm loss: 7.792056E+00 | loss scale: 4096.0 | grad norm: 14953.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 495/ 159576 | consumed samples: 7920 | elapsed time per iteration (ms): 14163.4 | learning rate: 2.197E-06 | global batch size: 16 | lm loss: 7.703032E+00 | loss scale: 4096.0 | grad norm: 14533.162 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 496/ 159576 | consumed samples: 7936 | elapsed time per iteration (ms): 13588.6 | learning rate: 2.201E-06 | global batch size: 16 | lm loss: 7.740438E+00 | loss scale: 4096.0 | grad norm: 13505.957 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 497/ 159576 | consumed samples: 7952 | elapsed time per iteration (ms): 13861.0 | learning rate: 2.206E-06 | global batch size: 16 | lm loss: 7.741710E+00 | loss scale: 4096.0 | grad norm: 15979.829 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 498/ 159576 | consumed samples: 7968 | elapsed time per iteration (ms): 13984.2 | learning rate: 2.210E-06 | global batch size: 16 | lm loss: 7.999316E+00 | loss scale: 4096.0 | grad norm: 17409.113 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 499/ 159576 | consumed samples: 7984 | elapsed time per iteration (ms): 13944.3 | learning rate: 2.214E-06 | global batch size: 16 | lm loss: 7.852047E+00 | loss scale: 4096.0 | grad norm: 17274.017 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 500/ 159576 | consumed samples: 8000 | elapsed time per iteration (ms): 13842.0 | learning rate: 2.219E-06 | global batch size: 16 | lm loss: 7.828729E+00 | loss scale: 8192.0 | grad norm: 13323.901 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 501/ 159576 | consumed samples: 8016 | elapsed time per iteration (ms): 13887.5 | learning rate: 2.223E-06 | global batch size: 16 | lm loss: 7.889397E+00 | loss scale: 8192.0 | grad norm: 36733.789 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 502/ 159576 | consumed samples: 8032 | elapsed time per iteration (ms): 14250.0 | learning rate: 2.228E-06 | global batch size: 16 | lm loss: 7.699535E+00 | loss scale: 8192.0 | grad norm: 25128.484 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 503/ 159576 | consumed samples: 8048 | elapsed time per iteration (ms): 14013.2 | learning rate: 2.232E-06 | global batch size: 16 | lm loss: 7.717435E+00 | loss scale: 8192.0 | grad norm: 27928.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 504/ 159576 | consumed samples: 8064 | elapsed time per iteration (ms): 13885.3 | learning rate: 2.237E-06 | global batch size: 16 | lm loss: 7.793045E+00 | loss scale: 8192.0 | grad norm: 25342.573 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 505/ 159576 | consumed samples: 8080 | elapsed time per iteration (ms): 14216.7 | learning rate: 2.241E-06 | global batch size: 16 | lm loss: 7.810180E+00 | loss scale: 8192.0 | grad norm: 32722.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 506/ 159576 | consumed samples: 8096 | elapsed time per iteration (ms): 13476.3 | learning rate: 2.246E-06 | global batch size: 16 | lm loss: 7.789536E+00 | loss scale: 8192.0 | grad norm: 28438.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 507/ 159576 | consumed samples: 8112 | elapsed time per iteration (ms): 13866.3 | learning rate: 2.250E-06 | global batch size: 16 | lm loss: 7.752525E+00 | loss scale: 8192.0 | grad norm: 38662.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 508/ 159576 | consumed samples: 8128 | elapsed time per iteration (ms): 14262.5 | learning rate: 2.254E-06 | global batch size: 16 | lm loss: 7.916237E+00 | loss scale: 8192.0 | grad norm: 36720.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 509/ 159576 | consumed samples: 8144 | elapsed time per iteration (ms): 13929.6 | learning rate: 2.259E-06 | global batch size: 16 | lm loss: 7.943053E+00 | loss scale: 8192.0 | grad norm: 38847.168 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 510/ 159576 | consumed samples: 8160 | elapsed time per iteration (ms): 13830.3 | learning rate: 2.263E-06 | global batch size: 16 | lm loss: 7.853089E+00 | loss scale: 8192.0 | grad norm: 37581.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 511/ 159576 | consumed samples: 8176 | elapsed time per iteration (ms): 13826.8 | learning rate: 2.268E-06 | global batch size: 16 | lm loss: 7.664119E+00 | loss scale: 8192.0 | grad norm: 34046.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 512/ 159576 | consumed samples: 8192 | elapsed time per iteration (ms): 14623.1 | learning rate: 2.272E-06 | global batch size: 16 | lm loss: 7.786874E+00 | loss scale: 8192.0 | grad norm: 28303.899 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 513/ 159576 | consumed samples: 8208 | elapsed time per iteration (ms): 13633.3 | learning rate: 2.277E-06 | global batch size: 16 | lm loss: 7.763934E+00 | loss scale: 8192.0 | grad norm: 32905.082 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 514/ 159576 | consumed samples: 8224 | elapsed time per iteration (ms): 13562.5 | learning rate: 2.281E-06 | global batch size: 16 | lm loss: 7.825607E+00 | loss scale: 8192.0 | grad norm: 32400.005 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 515/ 159576 | consumed samples: 8240 | elapsed time per iteration (ms): 13893.1 | learning rate: 2.286E-06 | global batch size: 16 | lm loss: 7.780645E+00 | loss scale: 8192.0 | grad norm: 39597.501 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 516/ 159576 | consumed samples: 8256 | elapsed time per iteration (ms): 13943.0 | learning rate: 2.290E-06 | global batch size: 16 | lm loss: 7.949652E+00 | loss scale: 8192.0 | grad norm: 29624.844 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 517/ 159576 | consumed samples: 8272 | elapsed time per iteration (ms): 13457.2 | learning rate: 2.294E-06 | global batch size: 16 | lm loss: 7.840482E+00 | loss scale: 8192.0 | grad norm: 34709.122 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 04:13:42] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1162855_[1-10%1] on 'gpu_p13' partition) [2021-09-24 04:13:42] PULSE: tr8-104B is running for 12:37 since 2021-09-24T04:01:05 (1162747 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 518/ 159576 | consumed samples: 8288 | elapsed time per iteration (ms): 13506.3 | learning rate: 2.299E-06 | global batch size: 16 | lm loss: 7.914812E+00 | loss scale: 8192.0 | grad norm: 24295.892 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 519/ 159576 | consumed samples: 8304 | elapsed time per iteration (ms): 14169.8 | learning rate: 2.303E-06 | global batch size: 16 | lm loss: 7.710842E+00 | loss scale: 8192.0 | grad norm: 32528.032 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 520/ 159576 | consumed samples: 8320 | elapsed time per iteration (ms): 13829.9 | learning rate: 2.308E-06 | global batch size: 16 | lm loss: 7.806552E+00 | loss scale: 8192.0 | grad norm: 37677.096 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 521/ 159576 | consumed samples: 8336 | elapsed time per iteration (ms): 13564.6 | learning rate: 2.312E-06 | global batch size: 16 | lm loss: 7.817222E+00 | loss scale: 8192.0 | grad norm: 30827.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 522/ 159576 | consumed samples: 8352 | elapsed time per iteration (ms): 13848.1 | learning rate: 2.317E-06 | global batch size: 16 | lm loss: 7.805755E+00 | loss scale: 8192.0 | grad norm: 31599.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 523/ 159576 | consumed samples: 8368 | elapsed time per iteration (ms): 13893.6 | learning rate: 2.321E-06 | global batch size: 16 | lm loss: 7.845006E+00 | loss scale: 8192.0 | grad norm: 34359.630 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 524/ 159576 | consumed samples: 8384 | elapsed time per iteration (ms): 13874.2 | learning rate: 2.325E-06 | global batch size: 16 | lm loss: 7.806132E+00 | loss scale: 8192.0 | grad norm: 34509.027 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 525/ 159576 | consumed samples: 8400 | elapsed time per iteration (ms): 14357.0 | learning rate: 2.330E-06 | global batch size: 16 | lm loss: 7.713592E+00 | loss scale: 8192.0 | grad norm: 36961.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 526/ 159576 | consumed samples: 8416 | elapsed time per iteration (ms): 14049.5 | learning rate: 2.334E-06 | global batch size: 16 | lm loss: 7.744096E+00 | loss scale: 8192.0 | grad norm: 46754.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 527/ 159576 | consumed samples: 8432 | elapsed time per iteration (ms): 14142.6 | learning rate: 2.339E-06 | global batch size: 16 | lm loss: 7.798402E+00 | loss scale: 8192.0 | grad norm: 38396.563 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 528/ 159576 | consumed samples: 8448 | elapsed time per iteration (ms): 13474.9 | learning rate: 2.343E-06 | global batch size: 16 | lm loss: 7.987565E+00 | loss scale: 8192.0 | grad norm: 36935.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 529/ 159576 | consumed samples: 8464 | elapsed time per iteration (ms): 14180.8 | learning rate: 2.348E-06 | global batch size: 16 | lm loss: 7.766053E+00 | loss scale: 8192.0 | grad norm: 35413.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 530/ 159576 | consumed samples: 8480 | elapsed time per iteration (ms): 13844.6 | learning rate: 2.352E-06 | global batch size: 16 | lm loss: 7.906172E+00 | loss scale: 8192.0 | grad norm: 26808.092 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 531/ 159576 | consumed samples: 8496 | elapsed time per iteration (ms): 13786.0 | learning rate: 2.357E-06 | global batch size: 16 | lm loss: 7.840616E+00 | loss scale: 8192.0 | grad norm: 38477.035 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 532/ 159576 | consumed samples: 8512 | elapsed time per iteration (ms): 13935.0 | learning rate: 2.361E-06 | global batch size: 16 | lm loss: 7.367872E+00 | loss scale: 8192.0 | grad norm: 51156.079 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 533/ 159576 | consumed samples: 8528 | elapsed time per iteration (ms): 14022.6 | learning rate: 2.365E-06 | global batch size: 16 | lm loss: 7.941976E+00 | loss scale: 8192.0 | grad norm: 46439.024 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 534/ 159576 | consumed samples: 8544 | elapsed time per iteration (ms): 14296.7 | learning rate: 2.370E-06 | global batch size: 16 | lm loss: 7.869607E+00 | loss scale: 8192.0 | grad norm: 29876.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 535/ 159576 | consumed samples: 8560 | elapsed time per iteration (ms): 13470.0 | learning rate: 2.374E-06 | global batch size: 16 | lm loss: 7.635067E+00 | loss scale: 8192.0 | grad norm: 34076.920 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 536/ 159576 | consumed samples: 8576 | elapsed time per iteration (ms): 13796.1 | learning rate: 2.379E-06 | global batch size: 16 | lm loss: 7.842813E+00 | loss scale: 8192.0 | grad norm: 41800.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 537/ 159576 | consumed samples: 8592 | elapsed time per iteration (ms): 13818.0 | learning rate: 2.383E-06 | global batch size: 16 | lm loss: 7.984433E+00 | loss scale: 8192.0 | grad norm: 38203.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 538/ 159576 | consumed samples: 8608 | elapsed time per iteration (ms): 14109.2 | learning rate: 2.388E-06 | global batch size: 16 | lm loss: 7.724606E+00 | loss scale: 8192.0 | grad norm: 44792.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 539/ 159576 | consumed samples: 8624 | elapsed time per iteration (ms): 13906.3 | learning rate: 2.392E-06 | global batch size: 16 | lm loss: 7.800515E+00 | loss scale: 8192.0 | grad norm: 32297.704 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 540/ 159576 | consumed samples: 8640 | elapsed time per iteration (ms): 14143.5 | learning rate: 2.396E-06 | global batch size: 16 | lm loss: 7.871832E+00 | loss scale: 8192.0 | grad norm: 43120.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 541/ 159576 | consumed samples: 8656 | elapsed time per iteration (ms): 14084.0 | learning rate: 2.401E-06 | global batch size: 16 | lm loss: 7.872537E+00 | loss scale: 8192.0 | grad norm: 36867.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 542/ 159576 | consumed samples: 8672 | elapsed time per iteration (ms): 13874.8 | learning rate: 2.405E-06 | global batch size: 16 | lm loss: 7.777860E+00 | loss scale: 8192.0 | grad norm: 43001.704 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 543/ 159576 | consumed samples: 8688 | elapsed time per iteration (ms): 13779.4 | learning rate: 2.410E-06 | global batch size: 16 | lm loss: 7.682357E+00 | loss scale: 8192.0 | grad norm: 57139.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 544/ 159576 | consumed samples: 8704 | elapsed time per iteration (ms): 14017.8 | learning rate: 2.414E-06 | global batch size: 16 | lm loss: 7.819186E+00 | loss scale: 8192.0 | grad norm: 29983.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 545/ 159576 | consumed samples: 8720 | elapsed time per iteration (ms): 13847.0 | learning rate: 2.419E-06 | global batch size: 16 | lm loss: 7.843667E+00 | loss scale: 8192.0 | grad norm: 66015.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 546/ 159576 | consumed samples: 8736 | elapsed time per iteration (ms): 13982.1 | learning rate: 2.423E-06 | global batch size: 16 | lm loss: 7.894298E+00 | loss scale: 8192.0 | grad norm: 51768.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 547/ 159576 | consumed samples: 8752 | elapsed time per iteration (ms): 14302.0 | learning rate: 2.428E-06 | global batch size: 16 | lm loss: 7.715273E+00 | loss scale: 8192.0 | grad norm: 39105.868 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 548/ 159576 | consumed samples: 8768 | elapsed time per iteration (ms): 14035.0 | learning rate: 2.432E-06 | global batch size: 16 | lm loss: 7.707379E+00 | loss scale: 8192.0 | grad norm: 39549.896 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 549/ 159576 | consumed samples: 8784 | elapsed time per iteration (ms): 13590.6 | learning rate: 2.436E-06 | global batch size: 16 | lm loss: 7.786090E+00 | loss scale: 8192.0 | grad norm: 29894.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 550/ 159576 | consumed samples: 8800 | elapsed time per iteration (ms): 13742.1 | learning rate: 2.441E-06 | global batch size: 16 | lm loss: 7.726188E+00 | loss scale: 8192.0 | grad norm: 34821.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 551/ 159576 | consumed samples: 8816 | elapsed time per iteration (ms): 13975.5 | learning rate: 2.445E-06 | global batch size: 16 | lm loss: 7.823754E+00 | loss scale: 8192.0 | grad norm: 41726.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 552/ 159576 | consumed samples: 8832 | elapsed time per iteration (ms): 13862.7 | learning rate: 2.450E-06 | global batch size: 16 | lm loss: 7.780801E+00 | loss scale: 8192.0 | grad norm: 39107.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 553/ 159576 | consumed samples: 8848 | elapsed time per iteration (ms): 13828.8 | learning rate: 2.454E-06 | global batch size: 16 | lm loss: 7.722218E+00 | loss scale: 8192.0 | grad norm: 34436.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 554/ 159576 | consumed samples: 8864 | elapsed time per iteration (ms): 14180.4 | learning rate: 2.459E-06 | global batch size: 16 | lm loss: 7.731545E+00 | loss scale: 8192.0 | grad norm: 26819.965 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 555/ 159576 | consumed samples: 8880 | elapsed time per iteration (ms): 14282.2 | learning rate: 2.463E-06 | global batch size: 16 | lm loss: 7.705241E+00 | loss scale: 8192.0 | grad norm: 49659.971 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 556/ 159576 | consumed samples: 8896 | elapsed time per iteration (ms): 13646.8 | learning rate: 2.467E-06 | global batch size: 16 | lm loss: 8.003874E+00 | loss scale: 8192.0 | grad norm: 37645.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 557/ 159576 | consumed samples: 8912 | elapsed time per iteration (ms): 13958.8 | learning rate: 2.472E-06 | global batch size: 16 | lm loss: 7.782984E+00 | loss scale: 8192.0 | grad norm: 61655.017 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 558/ 159576 | consumed samples: 8928 | elapsed time per iteration (ms): 13955.4 | learning rate: 2.476E-06 | global batch size: 16 | lm loss: 7.811559E+00 | loss scale: 8192.0 | grad norm: 48428.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 559/ 159576 | consumed samples: 8944 | elapsed time per iteration (ms): 14457.4 | learning rate: 2.481E-06 | global batch size: 16 | lm loss: 7.931767E+00 | loss scale: 8192.0 | grad norm: 38443.785 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 560/ 159576 | consumed samples: 8960 | elapsed time per iteration (ms): 13823.4 | learning rate: 2.485E-06 | global batch size: 16 | lm loss: 7.793911E+00 | loss scale: 8192.0 | grad norm: 40207.993 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 561/ 159576 | consumed samples: 8976 | elapsed time per iteration (ms): 13982.4 | learning rate: 2.490E-06 | global batch size: 16 | lm loss: 7.842747E+00 | loss scale: 8192.0 | grad norm: 36711.017 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 562/ 159576 | consumed samples: 8992 | elapsed time per iteration (ms): 14372.1 | learning rate: 2.494E-06 | global batch size: 16 | lm loss: 7.878882E+00 | loss scale: 8192.0 | grad norm: 54306.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 563/ 159576 | consumed samples: 9008 | elapsed time per iteration (ms): 13678.7 | learning rate: 2.499E-06 | global batch size: 16 | lm loss: 7.849220E+00 | loss scale: 8192.0 | grad norm: 37543.010 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 564/ 159576 | consumed samples: 9024 | elapsed time per iteration (ms): 14069.8 | learning rate: 2.503E-06 | global batch size: 16 | lm loss: 7.844311E+00 | loss scale: 8192.0 | grad norm: 44716.799 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 565/ 159576 | consumed samples: 9040 | elapsed time per iteration (ms): 13957.6 | learning rate: 2.507E-06 | global batch size: 16 | lm loss: 7.913968E+00 | loss scale: 8192.0 | grad norm: 47566.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 566/ 159576 | consumed samples: 9056 | elapsed time per iteration (ms): 14044.6 | learning rate: 2.512E-06 | global batch size: 16 | lm loss: 7.683057E+00 | loss scale: 8192.0 | grad norm: 46568.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 567/ 159576 | consumed samples: 9072 | elapsed time per iteration (ms): 13881.5 | learning rate: 2.516E-06 | global batch size: 16 | lm loss: 7.870160E+00 | loss scale: 8192.0 | grad norm: 41402.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 568/ 159576 | consumed samples: 9088 | elapsed time per iteration (ms): 14311.0 | learning rate: 2.521E-06 | global batch size: 16 | lm loss: 7.629350E+00 | loss scale: 8192.0 | grad norm: 39843.869 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 569/ 159576 | consumed samples: 9104 | elapsed time per iteration (ms): 14124.8 | learning rate: 2.525E-06 | global batch size: 16 | lm loss: 7.845489E+00 | loss scale: 8192.0 | grad norm: 47458.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 570/ 159576 | consumed samples: 9120 | elapsed time per iteration (ms): 13702.3 | learning rate: 2.530E-06 | global batch size: 16 | lm loss: 7.848298E+00 | loss scale: 8192.0 | grad norm: 53032.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 571/ 159576 | consumed samples: 9136 | elapsed time per iteration (ms): 13866.4 | learning rate: 2.534E-06 | global batch size: 16 | lm loss: 7.659620E+00 | loss scale: 8192.0 | grad norm: 37376.686 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 572/ 159576 | consumed samples: 9152 | elapsed time per iteration (ms): 14443.8 | learning rate: 2.538E-06 | global batch size: 16 | lm loss: 7.711428E+00 | loss scale: 8192.0 | grad norm: 36846.713 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 573/ 159576 | consumed samples: 9168 | elapsed time per iteration (ms): 13723.1 | learning rate: 2.543E-06 | global batch size: 16 | lm loss: 7.800463E+00 | loss scale: 8192.0 | grad norm: 40022.109 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 574/ 159576 | consumed samples: 9184 | elapsed time per iteration (ms): 13313.2 | learning rate: 2.547E-06 | global batch size: 16 | lm loss: 7.722570E+00 | loss scale: 8192.0 | grad norm: 57675.937 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 575/ 159576 | consumed samples: 9200 | elapsed time per iteration (ms): 13533.3 | learning rate: 2.552E-06 | global batch size: 16 | lm loss: 7.797169E+00 | loss scale: 8192.0 | grad norm: 44067.573 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 576/ 159576 | consumed samples: 9216 | elapsed time per iteration (ms): 13750.6 | learning rate: 2.556E-06 | global batch size: 16 | lm loss: 7.624088E+00 | loss scale: 8192.0 | grad norm: 37579.519 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 577/ 159576 | consumed samples: 9232 | elapsed time per iteration (ms): 14117.7 | learning rate: 2.561E-06 | global batch size: 16 | lm loss: 7.644238E+00 | loss scale: 8192.0 | grad norm: 57135.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 578/ 159576 | consumed samples: 9248 | elapsed time per iteration (ms): 13229.4 | learning rate: 2.565E-06 | global batch size: 16 | lm loss: 7.769429E+00 | loss scale: 8192.0 | grad norm: 45266.144 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 579/ 159576 | consumed samples: 9264 | elapsed time per iteration (ms): 13610.6 | learning rate: 2.570E-06 | global batch size: 16 | lm loss: 7.508770E+00 | loss scale: 8192.0 | grad norm: 35604.839 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 580/ 159576 | consumed samples: 9280 | elapsed time per iteration (ms): 13468.6 | learning rate: 2.574E-06 | global batch size: 16 | lm loss: 7.727168E+00 | loss scale: 8192.0 | grad norm: 37920.954 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 581/ 159576 | consumed samples: 9296 | elapsed time per iteration (ms): 14350.0 | learning rate: 2.578E-06 | global batch size: 16 | lm loss: 7.883451E+00 | loss scale: 8192.0 | grad norm: 46515.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 582/ 159576 | consumed samples: 9312 | elapsed time per iteration (ms): 13963.5 | learning rate: 2.583E-06 | global batch size: 16 | lm loss: 7.781512E+00 | loss scale: 8192.0 | grad norm: 50170.474 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 583/ 159576 | consumed samples: 9328 | elapsed time per iteration (ms): 13557.9 | learning rate: 2.587E-06 | global batch size: 16 | lm loss: 7.964473E+00 | loss scale: 8192.0 | grad norm: 29593.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 584/ 159576 | consumed samples: 9344 | elapsed time per iteration (ms): 13684.8 | learning rate: 2.592E-06 | global batch size: 16 | lm loss: 7.855813E+00 | loss scale: 8192.0 | grad norm: 39619.717 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 585/ 159576 | consumed samples: 9360 | elapsed time per iteration (ms): 13900.2 | learning rate: 2.596E-06 | global batch size: 16 | lm loss: 7.877661E+00 | loss scale: 8192.0 | grad norm: 31203.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 586/ 159576 | consumed samples: 9376 | elapsed time per iteration (ms): 13512.1 | learning rate: 2.601E-06 | global batch size: 16 | lm loss: 7.887114E+00 | loss scale: 8192.0 | grad norm: 63261.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 587/ 159576 | consumed samples: 9392 | elapsed time per iteration (ms): 13501.8 | learning rate: 2.605E-06 | global batch size: 16 | lm loss: 7.815706E+00 | loss scale: 8192.0 | grad norm: 47655.867 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 588/ 159576 | consumed samples: 9408 | elapsed time per iteration (ms): 13350.5 | learning rate: 2.609E-06 | global batch size: 16 | lm loss: 7.754656E+00 | loss scale: 8192.0 | grad norm: 49073.965 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 589/ 159576 | consumed samples: 9424 | elapsed time per iteration (ms): 13532.4 | learning rate: 2.614E-06 | global batch size: 16 | lm loss: 7.622519E+00 | loss scale: 8192.0 | grad norm: 39015.125 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 590/ 159576 | consumed samples: 9440 | elapsed time per iteration (ms): 13725.1 | learning rate: 2.618E-06 | global batch size: 16 | lm loss: 7.841989E+00 | loss scale: 8192.0 | grad norm: 59373.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 591/ 159576 | consumed samples: 9456 | elapsed time per iteration (ms): 13818.0 | learning rate: 2.623E-06 | global batch size: 16 | lm loss: 7.730304E+00 | loss scale: 8192.0 | grad norm: 56512.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 592/ 159576 | consumed samples: 9472 | elapsed time per iteration (ms): 13289.0 | learning rate: 2.627E-06 | global batch size: 16 | lm loss: 7.849043E+00 | loss scale: 8192.0 | grad norm: 44031.624 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 593/ 159576 | consumed samples: 9488 | elapsed time per iteration (ms): 13614.5 | learning rate: 2.632E-06 | global batch size: 16 | lm loss: 7.807899E+00 | loss scale: 8192.0 | grad norm: 43332.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 594/ 159576 | consumed samples: 9504 | elapsed time per iteration (ms): 14163.8 | learning rate: 2.636E-06 | global batch size: 16 | lm loss: 7.765454E+00 | loss scale: 8192.0 | grad norm: 57221.926 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 595/ 159576 | consumed samples: 9520 | elapsed time per iteration (ms): 13156.1 | learning rate: 2.641E-06 | global batch size: 16 | lm loss: 7.647946E+00 | loss scale: 8192.0 | grad norm: 61799.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 596/ 159576 | consumed samples: 9536 | elapsed time per iteration (ms): 13612.4 | learning rate: 2.645E-06 | global batch size: 16 | lm loss: 7.788985E+00 | loss scale: 8192.0 | grad norm: 47569.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 597/ 159576 | consumed samples: 9552 | elapsed time per iteration (ms): 13614.3 | learning rate: 2.649E-06 | global batch size: 16 | lm loss: 7.796825E+00 | loss scale: 8192.0 | grad norm: 34793.812 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 598/ 159576 | consumed samples: 9568 | elapsed time per iteration (ms): 13701.2 | learning rate: 2.654E-06 | global batch size: 16 | lm loss: 7.797745E+00 | loss scale: 8192.0 | grad norm: 78279.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 599/ 159576 | consumed samples: 9584 | elapsed time per iteration (ms): 13638.2 | learning rate: 2.658E-06 | global batch size: 16 | lm loss: 7.724266E+00 | loss scale: 8192.0 | grad norm: 52804.639 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 600/ 159576 | consumed samples: 9600 | elapsed time per iteration (ms): 13579.9 | learning rate: 2.663E-06 | global batch size: 16 | lm loss: 7.820310E+00 | loss scale: 8192.0 | grad norm: 37266.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 601/ 159576 | consumed samples: 9616 | elapsed time per iteration (ms): 13865.9 | learning rate: 2.667E-06 | global batch size: 16 | lm loss: 7.770097E+00 | loss scale: 8192.0 | grad norm: 35207.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 602/ 159576 | consumed samples: 9632 | elapsed time per iteration (ms): 13180.7 | learning rate: 2.672E-06 | global batch size: 16 | lm loss: 7.816167E+00 | loss scale: 8192.0 | grad norm: 38744.019 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 603/ 159576 | consumed samples: 9648 | elapsed time per iteration (ms): 13931.1 | learning rate: 2.676E-06 | global batch size: 16 | lm loss: 7.817324E+00 | loss scale: 8192.0 | grad norm: 36573.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 604/ 159576 | consumed samples: 9664 | elapsed time per iteration (ms): 13626.6 | learning rate: 2.680E-06 | global batch size: 16 | lm loss: 7.730925E+00 | loss scale: 8192.0 | grad norm: 34465.028 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 605/ 159576 | consumed samples: 9680 | elapsed time per iteration (ms): 13615.1 | learning rate: 2.685E-06 | global batch size: 16 | lm loss: 7.862791E+00 | loss scale: 8192.0 | grad norm: 36177.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 606/ 159576 | consumed samples: 9696 | elapsed time per iteration (ms): 13496.6 | learning rate: 2.689E-06 | global batch size: 16 | lm loss: 7.773019E+00 | loss scale: 8192.0 | grad norm: 41679.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 607/ 159576 | consumed samples: 9712 | elapsed time per iteration (ms): 14055.9 | learning rate: 2.694E-06 | global batch size: 16 | lm loss: 7.785677E+00 | loss scale: 8192.0 | grad norm: 37271.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 608/ 159576 | consumed samples: 9728 | elapsed time per iteration (ms): 13879.6 | learning rate: 2.698E-06 | global batch size: 16 | lm loss: 7.825086E+00 | loss scale: 8192.0 | grad norm: 47809.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 609/ 159576 | consumed samples: 9744 | elapsed time per iteration (ms): 13552.3 | learning rate: 2.703E-06 | global batch size: 16 | lm loss: 7.740236E+00 | loss scale: 8192.0 | grad norm: 52434.959 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 610/ 159576 | consumed samples: 9760 | elapsed time per iteration (ms): 13176.0 | learning rate: 2.707E-06 | global batch size: 16 | lm loss: 7.737531E+00 | loss scale: 8192.0 | grad norm: 48525.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 611/ 159576 | consumed samples: 9776 | elapsed time per iteration (ms): 13593.3 | learning rate: 2.712E-06 | global batch size: 16 | lm loss: 7.592016E+00 | loss scale: 8192.0 | grad norm: 43005.689 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 612/ 159576 | consumed samples: 9792 | elapsed time per iteration (ms): 13859.6 | learning rate: 2.716E-06 | global batch size: 16 | lm loss: 7.717112E+00 | loss scale: 8192.0 | grad norm: 39297.786 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 613/ 159576 | consumed samples: 9808 | elapsed time per iteration (ms): 13457.1 | learning rate: 2.720E-06 | global batch size: 16 | lm loss: 7.876259E+00 | loss scale: 8192.0 | grad norm: 46784.787 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 614/ 159576 | consumed samples: 9824 | elapsed time per iteration (ms): 13891.1 | learning rate: 2.725E-06 | global batch size: 16 | lm loss: 7.783233E+00 | loss scale: 8192.0 | grad norm: 55950.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 615/ 159576 | consumed samples: 9840 | elapsed time per iteration (ms): 13986.9 | learning rate: 2.729E-06 | global batch size: 16 | lm loss: 7.671467E+00 | loss scale: 8192.0 | grad norm: 37634.889 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 616/ 159576 | consumed samples: 9856 | elapsed time per iteration (ms): 14382.5 | learning rate: 2.734E-06 | global batch size: 16 | lm loss: 7.716076E+00 | loss scale: 8192.0 | grad norm: 39465.766 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 617/ 159576 | consumed samples: 9872 | elapsed time per iteration (ms): 13446.9 | learning rate: 2.738E-06 | global batch size: 16 | lm loss: 7.701165E+00 | loss scale: 8192.0 | grad norm: 33600.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 618/ 159576 | consumed samples: 9888 | elapsed time per iteration (ms): 13921.0 | learning rate: 2.743E-06 | global batch size: 16 | lm loss: 7.846385E+00 | loss scale: 8192.0 | grad norm: 34178.825 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 619/ 159576 | consumed samples: 9904 | elapsed time per iteration (ms): 13866.6 | learning rate: 2.747E-06 | global batch size: 16 | lm loss: 7.788978E+00 | loss scale: 8192.0 | grad norm: 39840.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 620/ 159576 | consumed samples: 9920 | elapsed time per iteration (ms): 14194.3 | learning rate: 2.751E-06 | global batch size: 16 | lm loss: 7.718859E+00 | loss scale: 8192.0 | grad norm: 35668.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 621/ 159576 | consumed samples: 9936 | elapsed time per iteration (ms): 14052.1 | learning rate: 2.756E-06 | global batch size: 16 | lm loss: 7.815299E+00 | loss scale: 8192.0 | grad norm: 65082.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 622/ 159576 | consumed samples: 9952 | elapsed time per iteration (ms): 13986.4 | learning rate: 2.760E-06 | global batch size: 16 | lm loss: 7.647432E+00 | loss scale: 8192.0 | grad norm: 30577.960 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 623/ 159576 | consumed samples: 9968 | elapsed time per iteration (ms): 14070.1 | learning rate: 2.765E-06 | global batch size: 16 | lm loss: 7.470105E+00 | loss scale: 8192.0 | grad norm: 49150.823 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 624/ 159576 | consumed samples: 9984 | elapsed time per iteration (ms): 13591.8 | learning rate: 2.769E-06 | global batch size: 16 | lm loss: 7.751683E+00 | loss scale: 8192.0 | grad norm: 37773.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 625/ 159576 | consumed samples: 10000 | elapsed time per iteration (ms): 14109.1 | learning rate: 2.774E-06 | global batch size: 16 | lm loss: 7.850559E+00 | loss scale: 8192.0 | grad norm: 49716.008 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 626/ 159576 | consumed samples: 10016 | elapsed time per iteration (ms): 13883.7 | learning rate: 2.778E-06 | global batch size: 16 | lm loss: 7.761450E+00 | loss scale: 8192.0 | grad norm: 40472.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 627/ 159576 | consumed samples: 10032 | elapsed time per iteration (ms): 13871.1 | learning rate: 2.783E-06 | global batch size: 16 | lm loss: 7.638558E+00 | loss scale: 8192.0 | grad norm: 32194.907 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 628/ 159576 | consumed samples: 10048 | elapsed time per iteration (ms): 14009.2 | learning rate: 2.787E-06 | global batch size: 16 | lm loss: 7.602344E+00 | loss scale: 8192.0 | grad norm: 48067.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 629/ 159576 | consumed samples: 10064 | elapsed time per iteration (ms): 14668.1 | learning rate: 2.791E-06 | global batch size: 16 | lm loss: 7.641259E+00 | loss scale: 8192.0 | grad norm: 36222.940 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 630/ 159576 | consumed samples: 10080 | elapsed time per iteration (ms): 13862.3 | learning rate: 2.796E-06 | global batch size: 16 | lm loss: 7.665779E+00 | loss scale: 8192.0 | grad norm: 42515.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 631/ 159576 | consumed samples: 10096 | elapsed time per iteration (ms): 13588.5 | learning rate: 2.800E-06 | global batch size: 16 | lm loss: 7.754525E+00 | loss scale: 8192.0 | grad norm: 49054.878 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 632/ 159576 | consumed samples: 10112 | elapsed time per iteration (ms): 13844.9 | learning rate: 2.805E-06 | global batch size: 16 | lm loss: 7.774928E+00 | loss scale: 8192.0 | grad norm: 45662.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 633/ 159576 | consumed samples: 10128 | elapsed time per iteration (ms): 14341.8 | learning rate: 2.809E-06 | global batch size: 16 | lm loss: 7.554594E+00 | loss scale: 8192.0 | grad norm: 60744.743 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 634/ 159576 | consumed samples: 10144 | elapsed time per iteration (ms): 13746.1 | learning rate: 2.814E-06 | global batch size: 16 | lm loss: 7.637143E+00 | loss scale: 8192.0 | grad norm: 49330.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 635/ 159576 | consumed samples: 10160 | elapsed time per iteration (ms): 13894.5 | learning rate: 2.818E-06 | global batch size: 16 | lm loss: 7.983640E+00 | loss scale: 8192.0 | grad norm: 49417.095 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 636/ 159576 | consumed samples: 10176 | elapsed time per iteration (ms): 14194.7 | learning rate: 2.822E-06 | global batch size: 16 | lm loss: 7.681066E+00 | loss scale: 8192.0 | grad norm: 61468.093 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 637/ 159576 | consumed samples: 10192 | elapsed time per iteration (ms): 13961.2 | learning rate: 2.827E-06 | global batch size: 16 | lm loss: 7.862648E+00 | loss scale: 8192.0 | grad norm: 72192.162 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 638/ 159576 | consumed samples: 10208 | elapsed time per iteration (ms): 13647.5 | learning rate: 2.831E-06 | global batch size: 16 | lm loss: 7.569575E+00 | loss scale: 8192.0 | grad norm: 45669.961 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 639/ 159576 | consumed samples: 10224 | elapsed time per iteration (ms): 13856.5 | learning rate: 2.836E-06 | global batch size: 16 | lm loss: 7.844266E+00 | loss scale: 8192.0 | grad norm: 36677.085 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 640/ 159576 | consumed samples: 10240 | elapsed time per iteration (ms): 14073.9 | learning rate: 2.840E-06 | global batch size: 16 | lm loss: 7.845327E+00 | loss scale: 8192.0 | grad norm: 96907.467 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 641/ 159576 | consumed samples: 10256 | elapsed time per iteration (ms): 13796.2 | learning rate: 2.845E-06 | global batch size: 16 | lm loss: 7.647357E+00 | loss scale: 8192.0 | grad norm: 57700.704 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 642/ 159576 | consumed samples: 10272 | elapsed time per iteration (ms): 14118.9 | learning rate: 2.849E-06 | global batch size: 16 | lm loss: 7.207680E+00 | loss scale: 8192.0 | grad norm: 51064.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 643/ 159576 | consumed samples: 10288 | elapsed time per iteration (ms): 14102.7 | learning rate: 2.854E-06 | global batch size: 16 | lm loss: 7.651158E+00 | loss scale: 8192.0 | grad norm: 42382.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 644/ 159576 | consumed samples: 10304 | elapsed time per iteration (ms): 14051.2 | learning rate: 2.858E-06 | global batch size: 16 | lm loss: 7.854011E+00 | loss scale: 8192.0 | grad norm: 91247.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 645/ 159576 | consumed samples: 10320 | elapsed time per iteration (ms): 13538.9 | learning rate: 2.862E-06 | global batch size: 16 | lm loss: 7.769484E+00 | loss scale: 8192.0 | grad norm: 69652.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 646/ 159576 | consumed samples: 10336 | elapsed time per iteration (ms): 14249.0 | learning rate: 2.867E-06 | global batch size: 16 | lm loss: 7.553013E+00 | loss scale: 8192.0 | grad norm: 51636.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 647/ 159576 | consumed samples: 10352 | elapsed time per iteration (ms): 13970.2 | learning rate: 2.871E-06 | global batch size: 16 | lm loss: 8.084120E+00 | loss scale: 8192.0 | grad norm: 43277.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 648/ 159576 | consumed samples: 10368 | elapsed time per iteration (ms): 13853.5 | learning rate: 2.876E-06 | global batch size: 16 | lm loss: 7.727980E+00 | loss scale: 8192.0 | grad norm: 61582.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 649/ 159576 | consumed samples: 10384 | elapsed time per iteration (ms): 13732.7 | learning rate: 2.880E-06 | global batch size: 16 | lm loss: 8.087885E+00 | loss scale: 8192.0 | grad norm: 80675.460 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 650/ 159576 | consumed samples: 10400 | elapsed time per iteration (ms): 14065.0 | learning rate: 2.885E-06 | global batch size: 16 | lm loss: 7.735159E+00 | loss scale: 8192.0 | grad norm: 57826.799 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 651/ 159576 | consumed samples: 10416 | elapsed time per iteration (ms): 14427.2 | learning rate: 2.889E-06 | global batch size: 16 | lm loss: 7.631308E+00 | loss scale: 8192.0 | grad norm: 36267.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 652/ 159576 | consumed samples: 10432 | elapsed time per iteration (ms): 13615.7 | learning rate: 2.893E-06 | global batch size: 16 | lm loss: 7.756464E+00 | loss scale: 8192.0 | grad norm: 90673.943 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 653/ 159576 | consumed samples: 10448 | elapsed time per iteration (ms): 13935.6 | learning rate: 2.898E-06 | global batch size: 16 | lm loss: 7.687772E+00 | loss scale: 8192.0 | grad norm: 73567.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 654/ 159576 | consumed samples: 10464 | elapsed time per iteration (ms): 14106.4 | learning rate: 2.902E-06 | global batch size: 16 | lm loss: 7.805472E+00 | loss scale: 8192.0 | grad norm: 43212.657 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 655/ 159576 | consumed samples: 10480 | elapsed time per iteration (ms): 13870.0 | learning rate: 2.907E-06 | global batch size: 16 | lm loss: 7.733329E+00 | loss scale: 8192.0 | grad norm: 42721.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 656/ 159576 | consumed samples: 10496 | elapsed time per iteration (ms): 13912.1 | learning rate: 2.911E-06 | global batch size: 16 | lm loss: 7.764544E+00 | loss scale: 8192.0 | grad norm: 95237.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 657/ 159576 | consumed samples: 10512 | elapsed time per iteration (ms): 13959.6 | learning rate: 2.916E-06 | global batch size: 16 | lm loss: 7.873410E+00 | loss scale: 8192.0 | grad norm: 58039.908 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 658/ 159576 | consumed samples: 10528 | elapsed time per iteration (ms): 14236.4 | learning rate: 2.920E-06 | global batch size: 16 | lm loss: 7.776018E+00 | loss scale: 8192.0 | grad norm: 47844.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 659/ 159576 | consumed samples: 10544 | elapsed time per iteration (ms): 14055.2 | learning rate: 2.925E-06 | global batch size: 16 | lm loss: 7.913632E+00 | loss scale: 8192.0 | grad norm: 52680.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 660/ 159576 | consumed samples: 10560 | elapsed time per iteration (ms): 13952.7 | learning rate: 2.929E-06 | global batch size: 16 | lm loss: 7.682195E+00 | loss scale: 8192.0 | grad norm: 43818.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 661/ 159576 | consumed samples: 10576 | elapsed time per iteration (ms): 14150.0 | learning rate: 2.933E-06 | global batch size: 16 | lm loss: 7.787490E+00 | loss scale: 8192.0 | grad norm: 79352.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 662/ 159576 | consumed samples: 10592 | elapsed time per iteration (ms): 13865.0 | learning rate: 2.938E-06 | global batch size: 16 | lm loss: 7.774850E+00 | loss scale: 8192.0 | grad norm: 38730.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 663/ 159576 | consumed samples: 10608 | elapsed time per iteration (ms): 14161.1 | learning rate: 2.942E-06 | global batch size: 16 | lm loss: 7.580084E+00 | loss scale: 8192.0 | grad norm: 41013.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 664/ 159576 | consumed samples: 10624 | elapsed time per iteration (ms): 13917.2 | learning rate: 2.947E-06 | global batch size: 16 | lm loss: 7.885849E+00 | loss scale: 8192.0 | grad norm: 52940.997 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 665/ 159576 | consumed samples: 10640 | elapsed time per iteration (ms): 14187.3 | learning rate: 2.951E-06 | global batch size: 16 | lm loss: 7.708643E+00 | loss scale: 8192.0 | grad norm: 45471.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 666/ 159576 | consumed samples: 10656 | elapsed time per iteration (ms): 13816.1 | learning rate: 2.956E-06 | global batch size: 16 | lm loss: 7.852731E+00 | loss scale: 8192.0 | grad norm: 34948.074 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 667/ 159576 | consumed samples: 10672 | elapsed time per iteration (ms): 13998.2 | learning rate: 2.960E-06 | global batch size: 16 | lm loss: 7.783283E+00 | loss scale: 8192.0 | grad norm: 72415.130 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 668/ 159576 | consumed samples: 10688 | elapsed time per iteration (ms): 14355.3 | learning rate: 2.964E-06 | global batch size: 16 | lm loss: 7.606567E+00 | loss scale: 8192.0 | grad norm: 40358.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 669/ 159576 | consumed samples: 10704 | elapsed time per iteration (ms): 13737.0 | learning rate: 2.969E-06 | global batch size: 16 | lm loss: 7.726189E+00 | loss scale: 8192.0 | grad norm: 40258.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 670/ 159576 | consumed samples: 10720 | elapsed time per iteration (ms): 13793.7 | learning rate: 2.973E-06 | global batch size: 16 | lm loss: 7.691747E+00 | loss scale: 8192.0 | grad norm: 41826.699 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 671/ 159576 | consumed samples: 10736 | elapsed time per iteration (ms): 13990.9 | learning rate: 2.978E-06 | global batch size: 16 | lm loss: 7.731771E+00 | loss scale: 8192.0 | grad norm: 73683.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 672/ 159576 | consumed samples: 10752 | elapsed time per iteration (ms): 14342.7 | learning rate: 2.982E-06 | global batch size: 16 | lm loss: 7.751697E+00 | loss scale: 8192.0 | grad norm: 45162.989 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 673/ 159576 | consumed samples: 10768 | elapsed time per iteration (ms): 14019.6 | learning rate: 2.987E-06 | global batch size: 16 | lm loss: 7.628830E+00 | loss scale: 8192.0 | grad norm: 50354.520 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 674/ 159576 | consumed samples: 10784 | elapsed time per iteration (ms): 13505.9 | learning rate: 2.991E-06 | global batch size: 16 | lm loss: 7.737679E+00 | loss scale: 8192.0 | grad norm: 42630.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 675/ 159576 | consumed samples: 10800 | elapsed time per iteration (ms): 14062.7 | learning rate: 2.996E-06 | global batch size: 16 | lm loss: 7.697219E+00 | loss scale: 8192.0 | grad norm: 74141.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 676/ 159576 | consumed samples: 10816 | elapsed time per iteration (ms): 14348.9 | learning rate: 3.000E-06 | global batch size: 16 | lm loss: 7.685856E+00 | loss scale: 8192.0 | grad norm: 42229.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 677/ 159576 | consumed samples: 10832 | elapsed time per iteration (ms): 13490.6 | learning rate: 3.004E-06 | global batch size: 16 | lm loss: 7.675433E+00 | loss scale: 8192.0 | grad norm: 41266.542 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 678/ 159576 | consumed samples: 10848 | elapsed time per iteration (ms): 13864.0 | learning rate: 3.009E-06 | global batch size: 16 | lm loss: 7.602362E+00 | loss scale: 8192.0 | grad norm: 28128.791 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 679/ 159576 | consumed samples: 10864 | elapsed time per iteration (ms): 13876.8 | learning rate: 3.013E-06 | global batch size: 16 | lm loss: 7.921748E+00 | loss scale: 8192.0 | grad norm: 94093.080 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 680/ 159576 | consumed samples: 10880 | elapsed time per iteration (ms): 14089.6 | learning rate: 3.018E-06 | global batch size: 16 | lm loss: 7.932827E+00 | loss scale: 8192.0 | grad norm: 66492.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 681/ 159576 | consumed samples: 10896 | elapsed time per iteration (ms): 13869.3 | learning rate: 3.022E-06 | global batch size: 16 | lm loss: 7.712299E+00 | loss scale: 8192.0 | grad norm: 48293.630 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 682/ 159576 | consumed samples: 10912 | elapsed time per iteration (ms): 14135.1 | learning rate: 3.027E-06 | global batch size: 16 | lm loss: 7.638190E+00 | loss scale: 8192.0 | grad norm: 38847.818 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 683/ 159576 | consumed samples: 10928 | elapsed time per iteration (ms): 13923.5 | learning rate: 3.031E-06 | global batch size: 16 | lm loss: 7.728378E+00 | loss scale: 8192.0 | grad norm: 145094.985 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 684/ 159576 | consumed samples: 10944 | elapsed time per iteration (ms): 13370.2 | learning rate: 3.036E-06 | global batch size: 16 | lm loss: 7.695971E+00 | loss scale: 8192.0 | grad norm: 72337.161 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 685/ 159576 | consumed samples: 10960 | elapsed time per iteration (ms): 14077.4 | learning rate: 3.040E-06 | global batch size: 16 | lm loss: 7.967864E+00 | loss scale: 8192.0 | grad norm: 60013.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 686/ 159576 | consumed samples: 10976 | elapsed time per iteration (ms): 13866.9 | learning rate: 3.044E-06 | global batch size: 16 | lm loss: 7.790969E+00 | loss scale: 8192.0 | grad norm: 66989.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 687/ 159576 | consumed samples: 10992 | elapsed time per iteration (ms): 13994.5 | learning rate: 3.049E-06 | global batch size: 16 | lm loss: 7.558614E+00 | loss scale: 8192.0 | grad norm: 41316.798 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 688/ 159576 | consumed samples: 11008 | elapsed time per iteration (ms): 13732.9 | learning rate: 3.053E-06 | global batch size: 16 | lm loss: 7.831646E+00 | loss scale: 8192.0 | grad norm: 113582.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 689/ 159576 | consumed samples: 11024 | elapsed time per iteration (ms): 14223.7 | learning rate: 3.058E-06 | global batch size: 16 | lm loss: 7.934176E+00 | loss scale: 8192.0 | grad norm: 88203.837 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 690/ 159576 | consumed samples: 11040 | elapsed time per iteration (ms): 14149.5 | learning rate: 3.062E-06 | global batch size: 16 | lm loss: 8.017797E+00 | loss scale: 8192.0 | grad norm: 58624.816 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 691/ 159576 | consumed samples: 11056 | elapsed time per iteration (ms): 13400.2 | learning rate: 3.067E-06 | global batch size: 16 | lm loss: 7.660833E+00 | loss scale: 8192.0 | grad norm: 55959.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 692/ 159576 | consumed samples: 11072 | elapsed time per iteration (ms): 13833.8 | learning rate: 3.071E-06 | global batch size: 16 | lm loss: 7.664068E+00 | loss scale: 8192.0 | grad norm: 59276.124 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 693/ 159576 | consumed samples: 11088 | elapsed time per iteration (ms): 14240.4 | learning rate: 3.075E-06 | global batch size: 16 | lm loss: 7.707018E+00 | loss scale: 8192.0 | grad norm: 93883.971 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 694/ 159576 | consumed samples: 11104 | elapsed time per iteration (ms): 13875.3 | learning rate: 3.080E-06 | global batch size: 16 | lm loss: 7.786274E+00 | loss scale: 8192.0 | grad norm: 64903.918 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 695/ 159576 | consumed samples: 11120 | elapsed time per iteration (ms): 13813.0 | learning rate: 3.084E-06 | global batch size: 16 | lm loss: 7.512930E+00 | loss scale: 8192.0 | grad norm: 51983.944 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 696/ 159576 | consumed samples: 11136 | elapsed time per iteration (ms): 13976.3 | learning rate: 3.089E-06 | global batch size: 16 | lm loss: 7.692935E+00 | loss scale: 8192.0 | grad norm: 60144.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 697/ 159576 | consumed samples: 11152 | elapsed time per iteration (ms): 14241.9 | learning rate: 3.093E-06 | global batch size: 16 | lm loss: 7.665162E+00 | loss scale: 8192.0 | grad norm: 45825.959 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 698/ 159576 | consumed samples: 11168 | elapsed time per iteration (ms): 13633.7 | learning rate: 3.098E-06 | global batch size: 16 | lm loss: 7.619460E+00 | loss scale: 8192.0 | grad norm: 50817.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 699/ 159576 | consumed samples: 11184 | elapsed time per iteration (ms): 13862.8 | learning rate: 3.102E-06 | global batch size: 16 | lm loss: 7.827911E+00 | loss scale: 8192.0 | grad norm: 55475.644 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 700/ 159576 | consumed samples: 11200 | elapsed time per iteration (ms): 13992.4 | learning rate: 3.107E-06 | global batch size: 16 | lm loss: 7.651889E+00 | loss scale: 8192.0 | grad norm: 41255.123 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 701/ 159576 | consumed samples: 11216 | elapsed time per iteration (ms): 13980.6 | learning rate: 3.111E-06 | global batch size: 16 | lm loss: 7.715150E+00 | loss scale: 8192.0 | grad norm: 54466.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 702/ 159576 | consumed samples: 11232 | elapsed time per iteration (ms): 13968.4 | learning rate: 3.115E-06 | global batch size: 16 | lm loss: 7.782993E+00 | loss scale: 8192.0 | grad norm: 52144.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 703/ 159576 | consumed samples: 11248 | elapsed time per iteration (ms): 13960.9 | learning rate: 3.120E-06 | global batch size: 16 | lm loss: 7.681329E+00 | loss scale: 8192.0 | grad norm: 51153.990 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 704/ 159576 | consumed samples: 11264 | elapsed time per iteration (ms): 14082.5 | learning rate: 3.124E-06 | global batch size: 16 | lm loss: 7.697348E+00 | loss scale: 8192.0 | grad norm: 30117.468 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 705/ 159576 | consumed samples: 11280 | elapsed time per iteration (ms): 13980.4 | learning rate: 3.129E-06 | global batch size: 16 | lm loss: 7.733425E+00 | loss scale: 8192.0 | grad norm: 49027.047 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 706/ 159576 | consumed samples: 11296 | elapsed time per iteration (ms): 13865.4 | learning rate: 3.133E-06 | global batch size: 16 | lm loss: 7.844088E+00 | loss scale: 8192.0 | grad norm: 43555.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 707/ 159576 | consumed samples: 11312 | elapsed time per iteration (ms): 13817.5 | learning rate: 3.138E-06 | global batch size: 16 | lm loss: 7.752273E+00 | loss scale: 8192.0 | grad norm: 96517.184 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 708/ 159576 | consumed samples: 11328 | elapsed time per iteration (ms): 13958.9 | learning rate: 3.142E-06 | global batch size: 16 | lm loss: 7.757376E+00 | loss scale: 8192.0 | grad norm: 77216.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 709/ 159576 | consumed samples: 11344 | elapsed time per iteration (ms): 13428.3 | learning rate: 3.146E-06 | global batch size: 16 | lm loss: 7.687693E+00 | loss scale: 8192.0 | grad norm: 57064.888 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 710/ 159576 | consumed samples: 11360 | elapsed time per iteration (ms): 13648.2 | learning rate: 3.151E-06 | global batch size: 16 | lm loss: 7.663705E+00 | loss scale: 8192.0 | grad norm: 50512.811 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 711/ 159576 | consumed samples: 11376 | elapsed time per iteration (ms): 14017.0 | learning rate: 3.155E-06 | global batch size: 16 | lm loss: 7.597622E+00 | loss scale: 8192.0 | grad norm: 52114.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 712/ 159576 | consumed samples: 11392 | elapsed time per iteration (ms): 13780.7 | learning rate: 3.160E-06 | global batch size: 16 | lm loss: 7.771480E+00 | loss scale: 8192.0 | grad norm: 169756.868 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 713/ 159576 | consumed samples: 11408 | elapsed time per iteration (ms): 13096.8 | learning rate: 3.164E-06 | global batch size: 16 | lm loss: 7.713109E+00 | loss scale: 8192.0 | grad norm: 87094.017 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 714/ 159576 | consumed samples: 11424 | elapsed time per iteration (ms): 13743.9 | learning rate: 3.169E-06 | global batch size: 16 | lm loss: 7.749861E+00 | loss scale: 8192.0 | grad norm: 49749.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 715/ 159576 | consumed samples: 11440 | elapsed time per iteration (ms): 14274.0 | learning rate: 3.173E-06 | global batch size: 16 | lm loss: 7.797529E+00 | loss scale: 8192.0 | grad norm: 51932.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 716/ 159576 | consumed samples: 11456 | elapsed time per iteration (ms): 13788.8 | learning rate: 3.178E-06 | global batch size: 16 | lm loss: 7.704132E+00 | loss scale: 8192.0 | grad norm: 68478.047 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 717/ 159576 | consumed samples: 11472 | elapsed time per iteration (ms): 13977.5 | learning rate: 3.182E-06 | global batch size: 16 | lm loss: 7.746219E+00 | loss scale: 8192.0 | grad norm: 107770.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 718/ 159576 | consumed samples: 11488 | elapsed time per iteration (ms): 13786.8 | learning rate: 3.186E-06 | global batch size: 16 | lm loss: 7.617724E+00 | loss scale: 8192.0 | grad norm: 57419.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 719/ 159576 | consumed samples: 11504 | elapsed time per iteration (ms): 14003.5 | learning rate: 3.191E-06 | global batch size: 16 | lm loss: 7.642632E+00 | loss scale: 8192.0 | grad norm: 48000.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 720/ 159576 | consumed samples: 11520 | elapsed time per iteration (ms): 13651.1 | learning rate: 3.195E-06 | global batch size: 16 | lm loss: 7.790938E+00 | loss scale: 8192.0 | grad norm: 45384.886 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 721/ 159576 | consumed samples: 11536 | elapsed time per iteration (ms): 13820.3 | learning rate: 3.200E-06 | global batch size: 16 | lm loss: 7.799318E+00 | loss scale: 8192.0 | grad norm: 94827.685 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 722/ 159576 | consumed samples: 11552 | elapsed time per iteration (ms): 13998.9 | learning rate: 3.204E-06 | global batch size: 16 | lm loss: 7.924202E+00 | loss scale: 8192.0 | grad norm: 106713.536 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 723/ 159576 | consumed samples: 11568 | elapsed time per iteration (ms): 13787.6 | learning rate: 3.209E-06 | global batch size: 16 | lm loss: 7.662113E+00 | loss scale: 8192.0 | grad norm: 53132.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 724/ 159576 | consumed samples: 11584 | elapsed time per iteration (ms): 14003.4 | learning rate: 3.213E-06 | global batch size: 16 | lm loss: 7.735355E+00 | loss scale: 8192.0 | grad norm: 46503.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 725/ 159576 | consumed samples: 11600 | elapsed time per iteration (ms): 14211.4 | learning rate: 3.217E-06 | global batch size: 16 | lm loss: 7.413515E+00 | loss scale: 8192.0 | grad norm: 46300.080 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 726/ 159576 | consumed samples: 11616 | elapsed time per iteration (ms): 14085.1 | learning rate: 3.222E-06 | global batch size: 16 | lm loss: 7.793005E+00 | loss scale: 8192.0 | grad norm: 123901.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 727/ 159576 | consumed samples: 11632 | elapsed time per iteration (ms): 13498.1 | learning rate: 3.226E-06 | global batch size: 16 | lm loss: 7.570110E+00 | loss scale: 8192.0 | grad norm: 110746.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 728/ 159576 | consumed samples: 11648 | elapsed time per iteration (ms): 13944.5 | learning rate: 3.231E-06 | global batch size: 16 | lm loss: 7.805285E+00 | loss scale: 8192.0 | grad norm: 54666.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 729/ 159576 | consumed samples: 11664 | elapsed time per iteration (ms): 13478.9 | learning rate: 3.235E-06 | global batch size: 16 | lm loss: 7.702326E+00 | loss scale: 8192.0 | grad norm: 95219.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 730/ 159576 | consumed samples: 11680 | elapsed time per iteration (ms): 13419.9 | learning rate: 3.240E-06 | global batch size: 16 | lm loss: 7.694516E+00 | loss scale: 8192.0 | grad norm: 44428.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 731/ 159576 | consumed samples: 11696 | elapsed time per iteration (ms): 13890.7 | learning rate: 3.244E-06 | global batch size: 16 | lm loss: 7.656667E+00 | loss scale: 8192.0 | grad norm: 79142.888 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 732/ 159576 | consumed samples: 11712 | elapsed time per iteration (ms): 14381.2 | learning rate: 3.249E-06 | global batch size: 16 | lm loss: 7.689932E+00 | loss scale: 8192.0 | grad norm: 69883.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 05:03:31] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1162855_[1-10%1] on 'gpu_p13' partition) [2021-09-24 05:03:31] PULSE: tr8-104B is running for 1:02:26 since 2021-09-24T04:01:05 (1162747 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 733/ 159576 | consumed samples: 11728 | elapsed time per iteration (ms): 13725.2 | learning rate: 3.253E-06 | global batch size: 16 | lm loss: 7.808900E+00 | loss scale: 8192.0 | grad norm: 50692.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 734/ 159576 | consumed samples: 11744 | elapsed time per iteration (ms): 13115.2 | learning rate: 3.257E-06 | global batch size: 16 | lm loss: 7.737029E+00 | loss scale: 8192.0 | grad norm: 69149.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 735/ 159576 | consumed samples: 11760 | elapsed time per iteration (ms): 13493.9 | learning rate: 3.262E-06 | global batch size: 16 | lm loss: 7.630354E+00 | loss scale: 8192.0 | grad norm: 85240.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 736/ 159576 | consumed samples: 11776 | elapsed time per iteration (ms): 13636.0 | learning rate: 3.266E-06 | global batch size: 16 | lm loss: 7.626644E+00 | loss scale: 8192.0 | grad norm: 57646.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 737/ 159576 | consumed samples: 11792 | elapsed time per iteration (ms): 13810.1 | learning rate: 3.271E-06 | global batch size: 16 | lm loss: 7.526936E+00 | loss scale: 8192.0 | grad norm: 95065.076 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 738/ 159576 | consumed samples: 11808 | elapsed time per iteration (ms): 13385.6 | learning rate: 3.275E-06 | global batch size: 16 | lm loss: 7.820796E+00 | loss scale: 8192.0 | grad norm: 113407.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 739/ 159576 | consumed samples: 11824 | elapsed time per iteration (ms): 13689.8 | learning rate: 3.280E-06 | global batch size: 16 | lm loss: 7.774467E+00 | loss scale: 8192.0 | grad norm: 98657.078 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 740/ 159576 | consumed samples: 11840 | elapsed time per iteration (ms): 13965.2 | learning rate: 3.284E-06 | global batch size: 16 | lm loss: 7.762564E+00 | loss scale: 8192.0 | grad norm: 71745.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 741/ 159576 | consumed samples: 11856 | elapsed time per iteration (ms): 13569.2 | learning rate: 3.288E-06 | global batch size: 16 | lm loss: 7.608281E+00 | loss scale: 8192.0 | grad norm: 40905.544 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 742/ 159576 | consumed samples: 11872 | elapsed time per iteration (ms): 13635.8 | learning rate: 3.293E-06 | global batch size: 16 | lm loss: 7.570668E+00 | loss scale: 8192.0 | grad norm: 80257.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 743/ 159576 | consumed samples: 11888 | elapsed time per iteration (ms): 13669.8 | learning rate: 3.297E-06 | global batch size: 16 | lm loss: 7.586653E+00 | loss scale: 8192.0 | grad norm: 56412.186 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 744/ 159576 | consumed samples: 11904 | elapsed time per iteration (ms): 13473.9 | learning rate: 3.302E-06 | global batch size: 16 | lm loss: 7.701398E+00 | loss scale: 8192.0 | grad norm: 100221.753 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 745/ 159576 | consumed samples: 11920 | elapsed time per iteration (ms): 13453.8 | learning rate: 3.306E-06 | global batch size: 16 | lm loss: 7.772648E+00 | loss scale: 8192.0 | grad norm: 88519.971 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 746/ 159576 | consumed samples: 11936 | elapsed time per iteration (ms): 13732.5 | learning rate: 3.311E-06 | global batch size: 16 | lm loss: 7.940891E+00 | loss scale: 8192.0 | grad norm: 66980.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 747/ 159576 | consumed samples: 11952 | elapsed time per iteration (ms): 13956.5 | learning rate: 3.315E-06 | global batch size: 16 | lm loss: 7.879022E+00 | loss scale: 8192.0 | grad norm: 73008.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 748/ 159576 | consumed samples: 11968 | elapsed time per iteration (ms): 13250.5 | learning rate: 3.320E-06 | global batch size: 16 | lm loss: 7.693480E+00 | loss scale: 8192.0 | grad norm: 45346.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 749/ 159576 | consumed samples: 11984 | elapsed time per iteration (ms): 13529.3 | learning rate: 3.324E-06 | global batch size: 16 | lm loss: 7.658270E+00 | loss scale: 8192.0 | grad norm: 156261.718 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 750/ 159576 | consumed samples: 12000 | elapsed time per iteration (ms): 14110.0 | learning rate: 3.328E-06 | global batch size: 16 | lm loss: 7.741945E+00 | loss scale: 8192.0 | grad norm: 121818.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 751/ 159576 | consumed samples: 12016 | elapsed time per iteration (ms): 13463.3 | learning rate: 3.333E-06 | global batch size: 16 | lm loss: 7.631550E+00 | loss scale: 8192.0 | grad norm: 69835.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 752/ 159576 | consumed samples: 12032 | elapsed time per iteration (ms): 13424.2 | learning rate: 3.337E-06 | global batch size: 16 | lm loss: 7.669878E+00 | loss scale: 8192.0 | grad norm: 47821.077 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 753/ 159576 | consumed samples: 12048 | elapsed time per iteration (ms): 13566.2 | learning rate: 3.342E-06 | global batch size: 16 | lm loss: 7.567214E+00 | loss scale: 8192.0 | grad norm: 68234.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 754/ 159576 | consumed samples: 12064 | elapsed time per iteration (ms): 14065.3 | learning rate: 3.346E-06 | global batch size: 16 | lm loss: 7.753268E+00 | loss scale: 8192.0 | grad norm: 134900.848 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 755/ 159576 | consumed samples: 12080 | elapsed time per iteration (ms): 13518.6 | learning rate: 3.351E-06 | global batch size: 16 | lm loss: 7.552173E+00 | loss scale: 8192.0 | grad norm: 48964.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 756/ 159576 | consumed samples: 12096 | elapsed time per iteration (ms): 13728.7 | learning rate: 3.355E-06 | global batch size: 16 | lm loss: 7.735795E+00 | loss scale: 8192.0 | grad norm: 73204.769 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 757/ 159576 | consumed samples: 12112 | elapsed time per iteration (ms): 14082.3 | learning rate: 3.359E-06 | global batch size: 16 | lm loss: 7.910018E+00 | loss scale: 8192.0 | grad norm: 83429.905 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 758/ 159576 | consumed samples: 12128 | elapsed time per iteration (ms): 13428.5 | learning rate: 3.364E-06 | global batch size: 16 | lm loss: 7.669195E+00 | loss scale: 8192.0 | grad norm: 61137.847 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 759/ 159576 | consumed samples: 12144 | elapsed time per iteration (ms): 13632.1 | learning rate: 3.368E-06 | global batch size: 16 | lm loss: 7.795278E+00 | loss scale: 8192.0 | grad norm: 59141.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 760/ 159576 | consumed samples: 12160 | elapsed time per iteration (ms): 13624.6 | learning rate: 3.373E-06 | global batch size: 16 | lm loss: 7.692988E+00 | loss scale: 8192.0 | grad norm: 104447.460 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 761/ 159576 | consumed samples: 12176 | elapsed time per iteration (ms): 13611.0 | learning rate: 3.377E-06 | global batch size: 16 | lm loss: 7.784515E+00 | loss scale: 8192.0 | grad norm: 51368.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 762/ 159576 | consumed samples: 12192 | elapsed time per iteration (ms): 13558.6 | learning rate: 3.382E-06 | global batch size: 16 | lm loss: 7.582584E+00 | loss scale: 8192.0 | grad norm: 61983.639 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 763/ 159576 | consumed samples: 12208 | elapsed time per iteration (ms): 13793.4 | learning rate: 3.386E-06 | global batch size: 16 | lm loss: 7.743572E+00 | loss scale: 8192.0 | grad norm: 56837.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 764/ 159576 | consumed samples: 12224 | elapsed time per iteration (ms): 13743.7 | learning rate: 3.391E-06 | global batch size: 16 | lm loss: 7.701952E+00 | loss scale: 8192.0 | grad norm: 92476.492 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 765/ 159576 | consumed samples: 12240 | elapsed time per iteration (ms): 13529.8 | learning rate: 3.395E-06 | global batch size: 16 | lm loss: 7.691103E+00 | loss scale: 8192.0 | grad norm: 103276.953 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 766/ 159576 | consumed samples: 12256 | elapsed time per iteration (ms): 13189.2 | learning rate: 3.399E-06 | global batch size: 16 | lm loss: 7.589336E+00 | loss scale: 8192.0 | grad norm: 54735.017 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 767/ 159576 | consumed samples: 12272 | elapsed time per iteration (ms): 13483.6 | learning rate: 3.404E-06 | global batch size: 16 | lm loss: 7.717595E+00 | loss scale: 8192.0 | grad norm: 54456.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 768/ 159576 | consumed samples: 12288 | elapsed time per iteration (ms): 13780.9 | learning rate: 3.408E-06 | global batch size: 16 | lm loss: 7.852913E+00 | loss scale: 8192.0 | grad norm: 88912.086 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 769/ 159576 | consumed samples: 12304 | elapsed time per iteration (ms): 13724.3 | learning rate: 3.413E-06 | global batch size: 16 | lm loss: 7.716819E+00 | loss scale: 8192.0 | grad norm: 102833.662 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 770/ 159576 | consumed samples: 12320 | elapsed time per iteration (ms): 13377.3 | learning rate: 3.417E-06 | global batch size: 16 | lm loss: 7.597641E+00 | loss scale: 8192.0 | grad norm: 50835.662 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 771/ 159576 | consumed samples: 12336 | elapsed time per iteration (ms): 13692.5 | learning rate: 3.422E-06 | global batch size: 16 | lm loss: 7.478999E+00 | loss scale: 8192.0 | grad norm: 53587.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 772/ 159576 | consumed samples: 12352 | elapsed time per iteration (ms): 14180.5 | learning rate: 3.426E-06 | global batch size: 16 | lm loss: 7.546258E+00 | loss scale: 8192.0 | grad norm: 63294.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 773/ 159576 | consumed samples: 12368 | elapsed time per iteration (ms): 13096.5 | learning rate: 3.430E-06 | global batch size: 16 | lm loss: 7.711743E+00 | loss scale: 8192.0 | grad norm: 99934.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 774/ 159576 | consumed samples: 12384 | elapsed time per iteration (ms): 13520.5 | learning rate: 3.435E-06 | global batch size: 16 | lm loss: 7.645664E+00 | loss scale: 8192.0 | grad norm: 56458.777 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 775/ 159576 | consumed samples: 12400 | elapsed time per iteration (ms): 13630.5 | learning rate: 3.439E-06 | global batch size: 16 | lm loss: 7.603559E+00 | loss scale: 8192.0 | grad norm: 46450.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 776/ 159576 | consumed samples: 12416 | elapsed time per iteration (ms): 14027.6 | learning rate: 3.444E-06 | global batch size: 16 | lm loss: 7.737686E+00 | loss scale: 8192.0 | grad norm: 141770.957 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 777/ 159576 | consumed samples: 12432 | elapsed time per iteration (ms): 13425.6 | learning rate: 3.448E-06 | global batch size: 16 | lm loss: 7.584914E+00 | loss scale: 8192.0 | grad norm: 124071.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 778/ 159576 | consumed samples: 12448 | elapsed time per iteration (ms): 13642.7 | learning rate: 3.453E-06 | global batch size: 16 | lm loss: 7.606685E+00 | loss scale: 8192.0 | grad norm: 53139.139 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 779/ 159576 | consumed samples: 12464 | elapsed time per iteration (ms): 13834.1 | learning rate: 3.457E-06 | global batch size: 16 | lm loss: 7.786515E+00 | loss scale: 8192.0 | grad norm: 58657.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 780/ 159576 | consumed samples: 12480 | elapsed time per iteration (ms): 13091.5 | learning rate: 3.462E-06 | global batch size: 16 | lm loss: 7.618142E+00 | loss scale: 8192.0 | grad norm: 37881.566 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 781/ 159576 | consumed samples: 12496 | elapsed time per iteration (ms): 14146.0 | learning rate: 3.466E-06 | global batch size: 16 | lm loss: 7.906812E+00 | loss scale: 8192.0 | grad norm: 114163.942 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 782/ 159576 | consumed samples: 12512 | elapsed time per iteration (ms): 14025.7 | learning rate: 3.470E-06 | global batch size: 16 | lm loss: 7.566094E+00 | loss scale: 8192.0 | grad norm: 46220.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 783/ 159576 | consumed samples: 12528 | elapsed time per iteration (ms): 13895.4 | learning rate: 3.475E-06 | global batch size: 16 | lm loss: 7.630446E+00 | loss scale: 8192.0 | grad norm: 64319.125 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 784/ 159576 | consumed samples: 12544 | elapsed time per iteration (ms): 13890.1 | learning rate: 3.479E-06 | global batch size: 16 | lm loss: 7.692337E+00 | loss scale: 8192.0 | grad norm: 48575.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 785/ 159576 | consumed samples: 12560 | elapsed time per iteration (ms): 14156.1 | learning rate: 3.484E-06 | global batch size: 16 | lm loss: 7.736514E+00 | loss scale: 8192.0 | grad norm: 90651.125 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 786/ 159576 | consumed samples: 12576 | elapsed time per iteration (ms): 14206.7 | learning rate: 3.488E-06 | global batch size: 16 | lm loss: 7.744794E+00 | loss scale: 8192.0 | grad norm: 84355.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 787/ 159576 | consumed samples: 12592 | elapsed time per iteration (ms): 13622.2 | learning rate: 3.493E-06 | global batch size: 16 | lm loss: 7.672806E+00 | loss scale: 8192.0 | grad norm: 51705.493 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 788/ 159576 | consumed samples: 12608 | elapsed time per iteration (ms): 13771.2 | learning rate: 3.497E-06 | global batch size: 16 | lm loss: 7.713612E+00 | loss scale: 8192.0 | grad norm: 50748.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 789/ 159576 | consumed samples: 12624 | elapsed time per iteration (ms): 14226.1 | learning rate: 3.501E-06 | global batch size: 16 | lm loss: 7.630927E+00 | loss scale: 8192.0 | grad norm: 68226.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 790/ 159576 | consumed samples: 12640 | elapsed time per iteration (ms): 14175.2 | learning rate: 3.506E-06 | global batch size: 16 | lm loss: 7.523444E+00 | loss scale: 8192.0 | grad norm: 67731.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 791/ 159576 | consumed samples: 12656 | elapsed time per iteration (ms): 13844.2 | learning rate: 3.510E-06 | global batch size: 16 | lm loss: 7.357096E+00 | loss scale: 8192.0 | grad norm: 45569.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 792/ 159576 | consumed samples: 12672 | elapsed time per iteration (ms): 13884.3 | learning rate: 3.515E-06 | global batch size: 16 | lm loss: 7.701885E+00 | loss scale: 8192.0 | grad norm: 53017.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 793/ 159576 | consumed samples: 12688 | elapsed time per iteration (ms): 14159.9 | learning rate: 3.519E-06 | global batch size: 16 | lm loss: 7.529918E+00 | loss scale: 8192.0 | grad norm: 55466.888 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 794/ 159576 | consumed samples: 12704 | elapsed time per iteration (ms): 13975.0 | learning rate: 3.524E-06 | global batch size: 16 | lm loss: 7.684763E+00 | loss scale: 8192.0 | grad norm: 44801.760 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 795/ 159576 | consumed samples: 12720 | elapsed time per iteration (ms): 13769.3 | learning rate: 3.528E-06 | global batch size: 16 | lm loss: 7.843237E+00 | loss scale: 8192.0 | grad norm: 59761.590 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 796/ 159576 | consumed samples: 12736 | elapsed time per iteration (ms): 13954.1 | learning rate: 3.533E-06 | global batch size: 16 | lm loss: 7.737316E+00 | loss scale: 8192.0 | grad norm: 66240.870 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 797/ 159576 | consumed samples: 12752 | elapsed time per iteration (ms): 13982.4 | learning rate: 3.537E-06 | global batch size: 16 | lm loss: 7.712746E+00 | loss scale: 8192.0 | grad norm: 53315.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 798/ 159576 | consumed samples: 12768 | elapsed time per iteration (ms): 14164.1 | learning rate: 3.541E-06 | global batch size: 16 | lm loss: 7.649867E+00 | loss scale: 8192.0 | grad norm: 46451.967 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 799/ 159576 | consumed samples: 12784 | elapsed time per iteration (ms): 14010.0 | learning rate: 3.546E-06 | global batch size: 16 | lm loss: 7.833376E+00 | loss scale: 8192.0 | grad norm: 65829.045 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 800/ 159576 | consumed samples: 12800 | elapsed time per iteration (ms): 14307.9 | learning rate: 3.550E-06 | global batch size: 16 | lm loss: 7.790625E+00 | loss scale: 8192.0 | grad norm: 71968.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 801/ 159576 | consumed samples: 12816 | elapsed time per iteration (ms): 13972.6 | learning rate: 3.555E-06 | global batch size: 16 | lm loss: 7.611866E+00 | loss scale: 8192.0 | grad norm: 48597.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 802/ 159576 | consumed samples: 12832 | elapsed time per iteration (ms): 13959.0 | learning rate: 3.559E-06 | global batch size: 16 | lm loss: 7.617666E+00 | loss scale: 8192.0 | grad norm: 147672.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 803/ 159576 | consumed samples: 12848 | elapsed time per iteration (ms): 13806.4 | learning rate: 3.564E-06 | global batch size: 16 | lm loss: 7.813154E+00 | loss scale: 8192.0 | grad norm: 121980.871 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 804/ 159576 | consumed samples: 12864 | elapsed time per iteration (ms): 13949.2 | learning rate: 3.568E-06 | global batch size: 16 | lm loss: 7.654176E+00 | loss scale: 8192.0 | grad norm: 52351.960 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 805/ 159576 | consumed samples: 12880 | elapsed time per iteration (ms): 13801.9 | learning rate: 3.572E-06 | global batch size: 16 | lm loss: 7.564305E+00 | loss scale: 8192.0 | grad norm: 62792.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 806/ 159576 | consumed samples: 12896 | elapsed time per iteration (ms): 13954.3 | learning rate: 3.577E-06 | global batch size: 16 | lm loss: 7.707185E+00 | loss scale: 8192.0 | grad norm: 64767.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 807/ 159576 | consumed samples: 12912 | elapsed time per iteration (ms): 14250.4 | learning rate: 3.581E-06 | global batch size: 16 | lm loss: 7.578569E+00 | loss scale: 8192.0 | grad norm: 73926.917 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 808/ 159576 | consumed samples: 12928 | elapsed time per iteration (ms): 14201.0 | learning rate: 3.586E-06 | global batch size: 16 | lm loss: 7.631069E+00 | loss scale: 8192.0 | grad norm: 110069.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 809/ 159576 | consumed samples: 12944 | elapsed time per iteration (ms): 13598.4 | learning rate: 3.590E-06 | global batch size: 16 | lm loss: 7.628491E+00 | loss scale: 8192.0 | grad norm: 49670.988 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 810/ 159576 | consumed samples: 12960 | elapsed time per iteration (ms): 13941.6 | learning rate: 3.595E-06 | global batch size: 16 | lm loss: 7.759563E+00 | loss scale: 8192.0 | grad norm: 45971.027 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 811/ 159576 | consumed samples: 12976 | elapsed time per iteration (ms): 14298.0 | learning rate: 3.599E-06 | global batch size: 16 | lm loss: 7.502759E+00 | loss scale: 8192.0 | grad norm: 77602.902 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 812/ 159576 | consumed samples: 12992 | elapsed time per iteration (ms): 13416.1 | learning rate: 3.604E-06 | global batch size: 16 | lm loss: 7.624804E+00 | loss scale: 8192.0 | grad norm: 95989.772 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 813/ 159576 | consumed samples: 13008 | elapsed time per iteration (ms): 13579.1 | learning rate: 3.608E-06 | global batch size: 16 | lm loss: 7.542982E+00 | loss scale: 8192.0 | grad norm: 52064.554 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 814/ 159576 | consumed samples: 13024 | elapsed time per iteration (ms): 14100.2 | learning rate: 3.612E-06 | global batch size: 16 | lm loss: 7.676429E+00 | loss scale: 8192.0 | grad norm: 38221.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 815/ 159576 | consumed samples: 13040 | elapsed time per iteration (ms): 14346.2 | learning rate: 3.617E-06 | global batch size: 16 | lm loss: 7.695131E+00 | loss scale: 8192.0 | grad norm: 57869.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 816/ 159576 | consumed samples: 13056 | elapsed time per iteration (ms): 13771.7 | learning rate: 3.621E-06 | global batch size: 16 | lm loss: 7.578337E+00 | loss scale: 8192.0 | grad norm: 49771.695 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 817/ 159576 | consumed samples: 13072 | elapsed time per iteration (ms): 13776.0 | learning rate: 3.626E-06 | global batch size: 16 | lm loss: 7.583301E+00 | loss scale: 8192.0 | grad norm: 46160.592 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 818/ 159576 | consumed samples: 13088 | elapsed time per iteration (ms): 14040.8 | learning rate: 3.630E-06 | global batch size: 16 | lm loss: 7.773385E+00 | loss scale: 8192.0 | grad norm: 42207.098 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 819/ 159576 | consumed samples: 13104 | elapsed time per iteration (ms): 13835.3 | learning rate: 3.635E-06 | global batch size: 16 | lm loss: 7.905573E+00 | loss scale: 8192.0 | grad norm: 111883.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 820/ 159576 | consumed samples: 13120 | elapsed time per iteration (ms): 13924.4 | learning rate: 3.639E-06 | global batch size: 16 | lm loss: 7.730550E+00 | loss scale: 8192.0 | grad norm: 75433.173 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 821/ 159576 | consumed samples: 13136 | elapsed time per iteration (ms): 13915.0 | learning rate: 3.643E-06 | global batch size: 16 | lm loss: 7.688564E+00 | loss scale: 8192.0 | grad norm: 41927.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 822/ 159576 | consumed samples: 13152 | elapsed time per iteration (ms): 13890.4 | learning rate: 3.648E-06 | global batch size: 16 | lm loss: 7.552343E+00 | loss scale: 8192.0 | grad norm: 96543.909 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 823/ 159576 | consumed samples: 13168 | elapsed time per iteration (ms): 13560.6 | learning rate: 3.652E-06 | global batch size: 16 | lm loss: 7.617982E+00 | loss scale: 8192.0 | grad norm: 56370.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 824/ 159576 | consumed samples: 13184 | elapsed time per iteration (ms): 14024.1 | learning rate: 3.657E-06 | global batch size: 16 | lm loss: 7.600199E+00 | loss scale: 8192.0 | grad norm: 61928.907 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 825/ 159576 | consumed samples: 13200 | elapsed time per iteration (ms): 14003.2 | learning rate: 3.661E-06 | global batch size: 16 | lm loss: 7.541789E+00 | loss scale: 8192.0 | grad norm: 56863.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 826/ 159576 | consumed samples: 13216 | elapsed time per iteration (ms): 13848.3 | learning rate: 3.666E-06 | global batch size: 16 | lm loss: 7.782004E+00 | loss scale: 8192.0 | grad norm: 59985.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 827/ 159576 | consumed samples: 13232 | elapsed time per iteration (ms): 13902.1 | learning rate: 3.670E-06 | global batch size: 16 | lm loss: 7.733065E+00 | loss scale: 8192.0 | grad norm: 39148.960 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 828/ 159576 | consumed samples: 13248 | elapsed time per iteration (ms): 14356.1 | learning rate: 3.675E-06 | global batch size: 16 | lm loss: 7.625387E+00 | loss scale: 8192.0 | grad norm: 56612.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 829/ 159576 | consumed samples: 13264 | elapsed time per iteration (ms): 14368.0 | learning rate: 3.679E-06 | global batch size: 16 | lm loss: 7.759684E+00 | loss scale: 8192.0 | grad norm: 67635.907 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 830/ 159576 | consumed samples: 13280 | elapsed time per iteration (ms): 13627.9 | learning rate: 3.683E-06 | global batch size: 16 | lm loss: 7.694915E+00 | loss scale: 8192.0 | grad norm: 60776.045 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 831/ 159576 | consumed samples: 13296 | elapsed time per iteration (ms): 13498.1 | learning rate: 3.688E-06 | global batch size: 16 | lm loss: 7.492978E+00 | loss scale: 8192.0 | grad norm: 42000.715 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 832/ 159576 | consumed samples: 13312 | elapsed time per iteration (ms): 13938.9 | learning rate: 3.692E-06 | global batch size: 16 | lm loss: 7.616700E+00 | loss scale: 8192.0 | grad norm: 105579.700 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 833/ 159576 | consumed samples: 13328 | elapsed time per iteration (ms): 13687.8 | learning rate: 3.697E-06 | global batch size: 16 | lm loss: 7.715961E+00 | loss scale: 8192.0 | grad norm: 78119.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 834/ 159576 | consumed samples: 13344 | elapsed time per iteration (ms): 13717.8 | learning rate: 3.701E-06 | global batch size: 16 | lm loss: 7.778497E+00 | loss scale: 8192.0 | grad norm: 58326.728 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 835/ 159576 | consumed samples: 13360 | elapsed time per iteration (ms): 13913.9 | learning rate: 3.706E-06 | global batch size: 16 | lm loss: 7.718093E+00 | loss scale: 8192.0 | grad norm: 48122.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 836/ 159576 | consumed samples: 13376 | elapsed time per iteration (ms): 14318.5 | learning rate: 3.710E-06 | global batch size: 16 | lm loss: 7.521303E+00 | loss scale: 8192.0 | grad norm: 60082.150 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 837/ 159576 | consumed samples: 13392 | elapsed time per iteration (ms): 13780.0 | learning rate: 3.714E-06 | global batch size: 16 | lm loss: 7.538383E+00 | loss scale: 8192.0 | grad norm: 61043.143 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 838/ 159576 | consumed samples: 13408 | elapsed time per iteration (ms): 13961.2 | learning rate: 3.719E-06 | global batch size: 16 | lm loss: 7.548276E+00 | loss scale: 8192.0 | grad norm: 58423.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 839/ 159576 | consumed samples: 13424 | elapsed time per iteration (ms): 14239.6 | learning rate: 3.723E-06 | global batch size: 16 | lm loss: 7.618182E+00 | loss scale: 8192.0 | grad norm: 48500.077 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 840/ 159576 | consumed samples: 13440 | elapsed time per iteration (ms): 13752.3 | learning rate: 3.728E-06 | global batch size: 16 | lm loss: 7.595082E+00 | loss scale: 8192.0 | grad norm: 50825.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 841/ 159576 | consumed samples: 13456 | elapsed time per iteration (ms): 14199.3 | learning rate: 3.732E-06 | global batch size: 16 | lm loss: 7.492725E+00 | loss scale: 8192.0 | grad norm: 56977.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 842/ 159576 | consumed samples: 13472 | elapsed time per iteration (ms): 13925.4 | learning rate: 3.737E-06 | global batch size: 16 | lm loss: 7.783816E+00 | loss scale: 8192.0 | grad norm: 40797.888 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 843/ 159576 | consumed samples: 13488 | elapsed time per iteration (ms): 14119.4 | learning rate: 3.741E-06 | global batch size: 16 | lm loss: 7.606951E+00 | loss scale: 8192.0 | grad norm: 50890.553 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 844/ 159576 | consumed samples: 13504 | elapsed time per iteration (ms): 13941.8 | learning rate: 3.746E-06 | global batch size: 16 | lm loss: 7.638199E+00 | loss scale: 8192.0 | grad norm: 52652.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 845/ 159576 | consumed samples: 13520 | elapsed time per iteration (ms): 14424.1 | learning rate: 3.750E-06 | global batch size: 16 | lm loss: 7.555171E+00 | loss scale: 8192.0 | grad norm: 48298.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 846/ 159576 | consumed samples: 13536 | elapsed time per iteration (ms): 14202.9 | learning rate: 3.754E-06 | global batch size: 16 | lm loss: 7.651504E+00 | loss scale: 8192.0 | grad norm: 76618.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 847/ 159576 | consumed samples: 13552 | elapsed time per iteration (ms): 13785.9 | learning rate: 3.759E-06 | global batch size: 16 | lm loss: 7.914087E+00 | loss scale: 8192.0 | grad norm: 40970.022 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 848/ 159576 | consumed samples: 13568 | elapsed time per iteration (ms): 13892.7 | learning rate: 3.763E-06 | global batch size: 16 | lm loss: 7.714731E+00 | loss scale: 8192.0 | grad norm: 47666.946 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 849/ 159576 | consumed samples: 13584 | elapsed time per iteration (ms): 13608.6 | learning rate: 3.768E-06 | global batch size: 16 | lm loss: 7.566309E+00 | loss scale: 8192.0 | grad norm: 56337.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 850/ 159576 | consumed samples: 13600 | elapsed time per iteration (ms): 13752.1 | learning rate: 3.772E-06 | global batch size: 16 | lm loss: 7.621016E+00 | loss scale: 8192.0 | grad norm: 55695.680 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 851/ 159576 | consumed samples: 13616 | elapsed time per iteration (ms): 13514.6 | learning rate: 3.777E-06 | global batch size: 16 | lm loss: 7.510153E+00 | loss scale: 8192.0 | grad norm: 70852.784 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 852/ 159576 | consumed samples: 13632 | elapsed time per iteration (ms): 13536.1 | learning rate: 3.781E-06 | global batch size: 16 | lm loss: 7.417966E+00 | loss scale: 8192.0 | grad norm: 43169.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 853/ 159576 | consumed samples: 13648 | elapsed time per iteration (ms): 14116.4 | learning rate: 3.786E-06 | global batch size: 16 | lm loss: 7.490001E+00 | loss scale: 8192.0 | grad norm: 61980.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 854/ 159576 | consumed samples: 13664 | elapsed time per iteration (ms): 14372.8 | learning rate: 3.790E-06 | global batch size: 16 | lm loss: 7.555287E+00 | loss scale: 8192.0 | grad norm: 43650.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 855/ 159576 | consumed samples: 13680 | elapsed time per iteration (ms): 13154.5 | learning rate: 3.794E-06 | global batch size: 16 | lm loss: 7.628311E+00 | loss scale: 8192.0 | grad norm: 32290.729 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 856/ 159576 | consumed samples: 13696 | elapsed time per iteration (ms): 13509.6 | learning rate: 3.799E-06 | global batch size: 16 | lm loss: 7.757495E+00 | loss scale: 8192.0 | grad norm: 94063.051 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 857/ 159576 | consumed samples: 13712 | elapsed time per iteration (ms): 14015.7 | learning rate: 3.803E-06 | global batch size: 16 | lm loss: 7.733263E+00 | loss scale: 8192.0 | grad norm: 53189.090 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 858/ 159576 | consumed samples: 13728 | elapsed time per iteration (ms): 14357.8 | learning rate: 3.808E-06 | global batch size: 16 | lm loss: 7.570580E+00 | loss scale: 8192.0 | grad norm: 57239.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 859/ 159576 | consumed samples: 13744 | elapsed time per iteration (ms): 13954.6 | learning rate: 3.812E-06 | global batch size: 16 | lm loss: 7.593122E+00 | loss scale: 8192.0 | grad norm: 45414.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 860/ 159576 | consumed samples: 13760 | elapsed time per iteration (ms): 14212.3 | learning rate: 3.817E-06 | global batch size: 16 | lm loss: 7.571471E+00 | loss scale: 8192.0 | grad norm: 75659.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 861/ 159576 | consumed samples: 13776 | elapsed time per iteration (ms): 14044.0 | learning rate: 3.821E-06 | global batch size: 16 | lm loss: 7.599829E+00 | loss scale: 8192.0 | grad norm: 47651.114 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 862/ 159576 | consumed samples: 13792 | elapsed time per iteration (ms): 13529.5 | learning rate: 3.825E-06 | global batch size: 16 | lm loss: 7.427186E+00 | loss scale: 8192.0 | grad norm: 76377.661 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 863/ 159576 | consumed samples: 13808 | elapsed time per iteration (ms): 14057.3 | learning rate: 3.830E-06 | global batch size: 16 | lm loss: 7.736305E+00 | loss scale: 8192.0 | grad norm: 76320.820 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 864/ 159576 | consumed samples: 13824 | elapsed time per iteration (ms): 14064.2 | learning rate: 3.834E-06 | global batch size: 16 | lm loss: 7.637553E+00 | loss scale: 8192.0 | grad norm: 56695.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 865/ 159576 | consumed samples: 13840 | elapsed time per iteration (ms): 14009.0 | learning rate: 3.839E-06 | global batch size: 16 | lm loss: 7.709378E+00 | loss scale: 8192.0 | grad norm: 77647.024 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 866/ 159576 | consumed samples: 13856 | elapsed time per iteration (ms): 13951.3 | learning rate: 3.843E-06 | global batch size: 16 | lm loss: 7.856131E+00 | loss scale: 8192.0 | grad norm: 85925.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 867/ 159576 | consumed samples: 13872 | elapsed time per iteration (ms): 14427.4 | learning rate: 3.848E-06 | global batch size: 16 | lm loss: 7.511599E+00 | loss scale: 8192.0 | grad norm: 50353.044 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 868/ 159576 | consumed samples: 13888 | elapsed time per iteration (ms): 14117.9 | learning rate: 3.852E-06 | global batch size: 16 | lm loss: 7.803133E+00 | loss scale: 8192.0 | grad norm: 73334.122 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 869/ 159576 | consumed samples: 13904 | elapsed time per iteration (ms): 13519.9 | learning rate: 3.857E-06 | global batch size: 16 | lm loss: 7.515793E+00 | loss scale: 8192.0 | grad norm: 73466.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 870/ 159576 | consumed samples: 13920 | elapsed time per iteration (ms): 13901.3 | learning rate: 3.861E-06 | global batch size: 16 | lm loss: 7.841221E+00 | loss scale: 8192.0 | grad norm: 74455.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 871/ 159576 | consumed samples: 13936 | elapsed time per iteration (ms): 14383.8 | learning rate: 3.865E-06 | global batch size: 16 | lm loss: 7.850037E+00 | loss scale: 8192.0 | grad norm: 49579.751 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 872/ 159576 | consumed samples: 13952 | elapsed time per iteration (ms): 14031.3 | learning rate: 3.870E-06 | global batch size: 16 | lm loss: 7.490081E+00 | loss scale: 8192.0 | grad norm: 71074.482 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 873/ 159576 | consumed samples: 13968 | elapsed time per iteration (ms): 13971.5 | learning rate: 3.874E-06 | global batch size: 16 | lm loss: 7.783985E+00 | loss scale: 8192.0 | grad norm: 102193.504 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 874/ 159576 | consumed samples: 13984 | elapsed time per iteration (ms): 14176.3 | learning rate: 3.879E-06 | global batch size: 16 | lm loss: 7.557288E+00 | loss scale: 8192.0 | grad norm: 71546.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 875/ 159576 | consumed samples: 14000 | elapsed time per iteration (ms): 14495.9 | learning rate: 3.883E-06 | global batch size: 16 | lm loss: 7.703010E+00 | loss scale: 8192.0 | grad norm: 50279.497 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 876/ 159576 | consumed samples: 14016 | elapsed time per iteration (ms): 13722.6 | learning rate: 3.888E-06 | global batch size: 16 | lm loss: 7.542592E+00 | loss scale: 8192.0 | grad norm: 44841.536 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 877/ 159576 | consumed samples: 14032 | elapsed time per iteration (ms): 13946.5 | learning rate: 3.892E-06 | global batch size: 16 | lm loss: 7.776785E+00 | loss scale: 8192.0 | grad norm: 109756.647 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 878/ 159576 | consumed samples: 14048 | elapsed time per iteration (ms): 13948.7 | learning rate: 3.896E-06 | global batch size: 16 | lm loss: 7.728590E+00 | loss scale: 8192.0 | grad norm: 70820.820 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 879/ 159576 | consumed samples: 14064 | elapsed time per iteration (ms): 13882.9 | learning rate: 3.901E-06 | global batch size: 16 | lm loss: 7.672616E+00 | loss scale: 8192.0 | grad norm: 44570.920 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 880/ 159576 | consumed samples: 14080 | elapsed time per iteration (ms): 14042.4 | learning rate: 3.905E-06 | global batch size: 16 | lm loss: 7.680589E+00 | loss scale: 8192.0 | grad norm: 124008.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 881/ 159576 | consumed samples: 14096 | elapsed time per iteration (ms): 13930.7 | learning rate: 3.910E-06 | global batch size: 16 | lm loss: 7.501089E+00 | loss scale: 8192.0 | grad norm: 46056.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 882/ 159576 | consumed samples: 14112 | elapsed time per iteration (ms): 14239.7 | learning rate: 3.914E-06 | global batch size: 16 | lm loss: 7.571886E+00 | loss scale: 8192.0 | grad norm: 66612.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 883/ 159576 | consumed samples: 14128 | elapsed time per iteration (ms): 13486.8 | learning rate: 3.919E-06 | global batch size: 16 | lm loss: 7.536567E+00 | loss scale: 8192.0 | grad norm: 62829.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 884/ 159576 | consumed samples: 14144 | elapsed time per iteration (ms): 14209.0 | learning rate: 3.923E-06 | global batch size: 16 | lm loss: 7.794725E+00 | loss scale: 8192.0 | grad norm: 67729.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 885/ 159576 | consumed samples: 14160 | elapsed time per iteration (ms): 13720.4 | learning rate: 3.928E-06 | global batch size: 16 | lm loss: 7.468060E+00 | loss scale: 8192.0 | grad norm: 44457.501 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 886/ 159576 | consumed samples: 14176 | elapsed time per iteration (ms): 13867.7 | learning rate: 3.932E-06 | global batch size: 16 | lm loss: 7.478938E+00 | loss scale: 8192.0 | grad norm: 45629.682 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 887/ 159576 | consumed samples: 14192 | elapsed time per iteration (ms): 13805.2 | learning rate: 3.936E-06 | global batch size: 16 | lm loss: 7.427522E+00 | loss scale: 8192.0 | grad norm: 59355.003 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 888/ 159576 | consumed samples: 14208 | elapsed time per iteration (ms): 14520.3 | learning rate: 3.941E-06 | global batch size: 16 | lm loss: 7.602240E+00 | loss scale: 8192.0 | grad norm: 45450.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 889/ 159576 | consumed samples: 14224 | elapsed time per iteration (ms): 13870.2 | learning rate: 3.945E-06 | global batch size: 16 | lm loss: 7.682034E+00 | loss scale: 8192.0 | grad norm: 51153.138 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 890/ 159576 | consumed samples: 14240 | elapsed time per iteration (ms): 13708.4 | learning rate: 3.950E-06 | global batch size: 16 | lm loss: 7.558862E+00 | loss scale: 8192.0 | grad norm: 46389.657 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 891/ 159576 | consumed samples: 14256 | elapsed time per iteration (ms): 13645.4 | learning rate: 3.954E-06 | global batch size: 16 | lm loss: 7.527663E+00 | loss scale: 8192.0 | grad norm: 86582.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 892/ 159576 | consumed samples: 14272 | elapsed time per iteration (ms): 13652.2 | learning rate: 3.959E-06 | global batch size: 16 | lm loss: 7.675562E+00 | loss scale: 8192.0 | grad norm: 68924.015 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 893/ 159576 | consumed samples: 14288 | elapsed time per iteration (ms): 14020.9 | learning rate: 3.963E-06 | global batch size: 16 | lm loss: 7.534761E+00 | loss scale: 8192.0 | grad norm: 47359.573 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 894/ 159576 | consumed samples: 14304 | elapsed time per iteration (ms): 13841.4 | learning rate: 3.967E-06 | global batch size: 16 | lm loss: 7.447322E+00 | loss scale: 8192.0 | grad norm: 51692.050 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 895/ 159576 | consumed samples: 14320 | elapsed time per iteration (ms): 14037.6 | learning rate: 3.972E-06 | global batch size: 16 | lm loss: 7.507210E+00 | loss scale: 8192.0 | grad norm: 64045.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 896/ 159576 | consumed samples: 14336 | elapsed time per iteration (ms): 14109.9 | learning rate: 3.976E-06 | global batch size: 16 | lm loss: 7.523023E+00 | loss scale: 8192.0 | grad norm: 62130.023 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 897/ 159576 | consumed samples: 14352 | elapsed time per iteration (ms): 14567.0 | learning rate: 3.981E-06 | global batch size: 16 | lm loss: 7.609581E+00 | loss scale: 8192.0 | grad norm: 45111.563 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 898/ 159576 | consumed samples: 14368 | elapsed time per iteration (ms): 13613.4 | learning rate: 3.985E-06 | global batch size: 16 | lm loss: 7.677504E+00 | loss scale: 8192.0 | grad norm: 77037.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 899/ 159576 | consumed samples: 14384 | elapsed time per iteration (ms): 13889.7 | learning rate: 3.990E-06 | global batch size: 16 | lm loss: 7.463535E+00 | loss scale: 8192.0 | grad norm: 63218.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 900/ 159576 | consumed samples: 14400 | elapsed time per iteration (ms): 13953.1 | learning rate: 3.994E-06 | global batch size: 16 | lm loss: 7.512316E+00 | loss scale: 8192.0 | grad norm: 45889.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 901/ 159576 | consumed samples: 14416 | elapsed time per iteration (ms): 14162.8 | learning rate: 3.999E-06 | global batch size: 16 | lm loss: 7.882708E+00 | loss scale: 8192.0 | grad norm: 42823.467 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 902/ 159576 | consumed samples: 14432 | elapsed time per iteration (ms): 13923.6 | learning rate: 4.003E-06 | global batch size: 16 | lm loss: 7.662213E+00 | loss scale: 8192.0 | grad norm: 61513.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 903/ 159576 | consumed samples: 14448 | elapsed time per iteration (ms): 14309.5 | learning rate: 4.007E-06 | global batch size: 16 | lm loss: 7.560106E+00 | loss scale: 8192.0 | grad norm: 69145.911 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 904/ 159576 | consumed samples: 14464 | elapsed time per iteration (ms): 13872.6 | learning rate: 4.012E-06 | global batch size: 16 | lm loss: 7.580536E+00 | loss scale: 8192.0 | grad norm: 50555.734 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 905/ 159576 | consumed samples: 14480 | elapsed time per iteration (ms): 13660.1 | learning rate: 4.016E-06 | global batch size: 16 | lm loss: 7.370582E+00 | loss scale: 8192.0 | grad norm: 58747.890 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 906/ 159576 | consumed samples: 14496 | elapsed time per iteration (ms): 14302.6 | learning rate: 4.021E-06 | global batch size: 16 | lm loss: 7.578561E+00 | loss scale: 8192.0 | grad norm: 51271.016 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 907/ 159576 | consumed samples: 14512 | elapsed time per iteration (ms): 13761.7 | learning rate: 4.025E-06 | global batch size: 16 | lm loss: 7.886317E+00 | loss scale: 8192.0 | grad norm: 103662.947 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 908/ 159576 | consumed samples: 14528 | elapsed time per iteration (ms): 13804.9 | learning rate: 4.030E-06 | global batch size: 16 | lm loss: 7.671743E+00 | loss scale: 8192.0 | grad norm: 73682.928 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 909/ 159576 | consumed samples: 14544 | elapsed time per iteration (ms): 13551.5 | learning rate: 4.034E-06 | global batch size: 16 | lm loss: 7.644366E+00 | loss scale: 8192.0 | grad norm: 44749.062 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 910/ 159576 | consumed samples: 14560 | elapsed time per iteration (ms): 14145.8 | learning rate: 4.038E-06 | global batch size: 16 | lm loss: 7.575992E+00 | loss scale: 8192.0 | grad norm: 123440.918 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 911/ 159576 | consumed samples: 14576 | elapsed time per iteration (ms): 13697.4 | learning rate: 4.043E-06 | global batch size: 16 | lm loss: 7.622074E+00 | loss scale: 8192.0 | grad norm: 106507.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 912/ 159576 | consumed samples: 14592 | elapsed time per iteration (ms): 13234.0 | learning rate: 4.047E-06 | global batch size: 16 | lm loss: 7.362756E+00 | loss scale: 8192.0 | grad norm: 47407.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 913/ 159576 | consumed samples: 14608 | elapsed time per iteration (ms): 13588.2 | learning rate: 4.052E-06 | global batch size: 16 | lm loss: 7.463619E+00 | loss scale: 8192.0 | grad norm: 52603.656 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 914/ 159576 | consumed samples: 14624 | elapsed time per iteration (ms): 13866.4 | learning rate: 4.056E-06 | global batch size: 16 | lm loss: 7.559254E+00 | loss scale: 8192.0 | grad norm: 75070.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 915/ 159576 | consumed samples: 14640 | elapsed time per iteration (ms): 13445.5 | learning rate: 4.061E-06 | global batch size: 16 | lm loss: 7.466935E+00 | loss scale: 8192.0 | grad norm: 84703.653 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 916/ 159576 | consumed samples: 14656 | elapsed time per iteration (ms): 13592.3 | learning rate: 4.065E-06 | global batch size: 16 | lm loss: 7.530110E+00 | loss scale: 8192.0 | grad norm: 68897.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 917/ 159576 | consumed samples: 14672 | elapsed time per iteration (ms): 13623.0 | learning rate: 4.070E-06 | global batch size: 16 | lm loss: 7.709665E+00 | loss scale: 8192.0 | grad norm: 42674.546 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 918/ 159576 | consumed samples: 14688 | elapsed time per iteration (ms): 13933.4 | learning rate: 4.074E-06 | global batch size: 16 | lm loss: 7.340624E+00 | loss scale: 8192.0 | grad norm: 62308.866 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 919/ 159576 | consumed samples: 14704 | elapsed time per iteration (ms): 13383.8 | learning rate: 4.078E-06 | global batch size: 16 | lm loss: 7.633225E+00 | loss scale: 8192.0 | grad norm: 101681.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 920/ 159576 | consumed samples: 14720 | elapsed time per iteration (ms): 13577.7 | learning rate: 4.083E-06 | global batch size: 16 | lm loss: 7.753546E+00 | loss scale: 8192.0 | grad norm: 64758.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 921/ 159576 | consumed samples: 14736 | elapsed time per iteration (ms): 13615.2 | learning rate: 4.087E-06 | global batch size: 16 | lm loss: 7.587958E+00 | loss scale: 8192.0 | grad norm: 50894.580 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 922/ 159576 | consumed samples: 14752 | elapsed time per iteration (ms): 13349.8 | learning rate: 4.092E-06 | global batch size: 16 | lm loss: 7.769899E+00 | loss scale: 8192.0 | grad norm: 142837.991 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 923/ 159576 | consumed samples: 14768 | elapsed time per iteration (ms): 13909.6 | learning rate: 4.096E-06 | global batch size: 16 | lm loss: 7.624977E+00 | loss scale: 8192.0 | grad norm: 83848.961 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 924/ 159576 | consumed samples: 14784 | elapsed time per iteration (ms): 13544.9 | learning rate: 4.101E-06 | global batch size: 16 | lm loss: 7.603238E+00 | loss scale: 8192.0 | grad norm: 56820.812 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 925/ 159576 | consumed samples: 14800 | elapsed time per iteration (ms): 14229.7 | learning rate: 4.105E-06 | global batch size: 16 | lm loss: 7.706733E+00 | loss scale: 8192.0 | grad norm: 76791.134 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 926/ 159576 | consumed samples: 14816 | elapsed time per iteration (ms): 13216.1 | learning rate: 4.109E-06 | global batch size: 16 | lm loss: 7.619715E+00 | loss scale: 8192.0 | grad norm: 71541.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 927/ 159576 | consumed samples: 14832 | elapsed time per iteration (ms): 13878.1 | learning rate: 4.114E-06 | global batch size: 16 | lm loss: 7.712871E+00 | loss scale: 8192.0 | grad norm: 73909.646 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 928/ 159576 | consumed samples: 14848 | elapsed time per iteration (ms): 13952.8 | learning rate: 4.118E-06 | global batch size: 16 | lm loss: 7.413386E+00 | loss scale: 8192.0 | grad norm: 57651.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 929/ 159576 | consumed samples: 14864 | elapsed time per iteration (ms): 13472.5 | learning rate: 4.123E-06 | global batch size: 16 | lm loss: 7.559020E+00 | loss scale: 8192.0 | grad norm: 91128.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 930/ 159576 | consumed samples: 14880 | elapsed time per iteration (ms): 13393.9 | learning rate: 4.127E-06 | global batch size: 16 | lm loss: 7.636448E+00 | loss scale: 8192.0 | grad norm: 48957.093 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 931/ 159576 | consumed samples: 14896 | elapsed time per iteration (ms): 13547.0 | learning rate: 4.132E-06 | global batch size: 16 | lm loss: 7.639730E+00 | loss scale: 8192.0 | grad norm: 110788.722 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 932/ 159576 | consumed samples: 14912 | elapsed time per iteration (ms): 14018.3 | learning rate: 4.136E-06 | global batch size: 16 | lm loss: 7.652531E+00 | loss scale: 8192.0 | grad norm: 96359.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 933/ 159576 | consumed samples: 14928 | elapsed time per iteration (ms): 13449.4 | learning rate: 4.141E-06 | global batch size: 16 | lm loss: 7.671719E+00 | loss scale: 8192.0 | grad norm: 60936.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 934/ 159576 | consumed samples: 14944 | elapsed time per iteration (ms): 13624.9 | learning rate: 4.145E-06 | global batch size: 16 | lm loss: 7.672961E+00 | loss scale: 8192.0 | grad norm: 45848.114 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 935/ 159576 | consumed samples: 14960 | elapsed time per iteration (ms): 13787.5 | learning rate: 4.149E-06 | global batch size: 16 | lm loss: 7.740889E+00 | loss scale: 8192.0 | grad norm: 140359.981 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 936/ 159576 | consumed samples: 14976 | elapsed time per iteration (ms): 13643.3 | learning rate: 4.154E-06 | global batch size: 16 | lm loss: 7.595088E+00 | loss scale: 8192.0 | grad norm: 125926.574 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 937/ 159576 | consumed samples: 14992 | elapsed time per iteration (ms): 13588.2 | learning rate: 4.158E-06 | global batch size: 16 | lm loss: 7.580822E+00 | loss scale: 8192.0 | grad norm: 88915.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 938/ 159576 | consumed samples: 15008 | elapsed time per iteration (ms): 13606.3 | learning rate: 4.163E-06 | global batch size: 16 | lm loss: 7.766950E+00 | loss scale: 8192.0 | grad norm: 88671.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 939/ 159576 | consumed samples: 15024 | elapsed time per iteration (ms): 13894.4 | learning rate: 4.167E-06 | global batch size: 16 | lm loss: 7.578055E+00 | loss scale: 8192.0 | grad norm: 66434.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 940/ 159576 | consumed samples: 15040 | elapsed time per iteration (ms): 13885.0 | learning rate: 4.172E-06 | global batch size: 16 | lm loss: 7.837738E+00 | loss scale: 8192.0 | grad norm: 64490.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 941/ 159576 | consumed samples: 15056 | elapsed time per iteration (ms): 14127.9 | learning rate: 4.176E-06 | global batch size: 16 | lm loss: 7.961911E+00 | loss scale: 8192.0 | grad norm: 155493.780 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 942/ 159576 | consumed samples: 15072 | elapsed time per iteration (ms): 14120.5 | learning rate: 4.180E-06 | global batch size: 16 | lm loss: 7.581886E+00 | loss scale: 8192.0 | grad norm: 84829.182 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 942 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-24 05:51:49,558] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step942/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 942 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 17459.68 [exiting program after 110.12040019432703 minutes] datetime: 2021-09-24 05:52:01 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO]ninja ......................... [OKAY][OKAY] --------------------------------------------------fused_lamb .............op name [NO]................ .......installed [OKAY].. compatible -------------------------------------------------- cpu_adamsparse_attn ........................... [YES][NO] ...... .......[OKAY] [OKAY] transformer ............ [NO] .......fused_adam [OKAY]............. [NO] ....... stochastic_transformer[OKAY] . fused_lamb[NO] .................... [NO][OKAY] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] op name ................ installed .. compatible fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] --------------------------------------------------transformer ............ DeepSpeed C++/CUDA extension op report[NO] .......-------------------------------------------------- [OKAY]NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninjastochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 DeepSpeed general environment info:torch cuda version ............... 11.1 nvcc version .....................torch install path 11.2 ...............deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed info ................... torch version0.4.2+bc17042, bc17042, big-science ....................deepspeed wheel compiled w. 1.8.1...... torch 1.8, cuda 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] fused_adamfused_adam .......................... [NO] [NO]....... ....... [OKAY][OKAY] fused_lamb .............fused_lamb [NO]............. .......[NO] [OKAY]....... [OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] transformertransformer ........................ [NO][NO] .............. [OKAY][OKAY] stochastic_transformer stochastic_transformer . [NO]. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- fused_adam op name............. ................[NO] installed....... ..[OKAY] compatible -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] sparse_attn ............ [NO] .......fused_adam [OKAY]............. [NO] .......transformer [OKAY]............ [NO] .......fused_lamb [OKAY]............. [NO] ....... [OKAY]stochastic_transformer . [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] --------------------------------------------------sparse_attn ............ [NO] DeepSpeed C++/CUDA extension op report....... [OKAY]-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.transformer ............-------------------------------------------------- [NO]JIT compiled ops requires ninja ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] cpu_adam-------------------------------------------------- ............... op name[YES] ................ ......installed ..[OKAY] compatible -------------------------------------------------- fused_adamcpu_adam ............................ [YES][NO] ...... .......[OKAY] [OKAY] fused_lamb ............. [NO] ....... fused_adam[OKAY] ............. [NO] ....... [OKAY] fused_lamb ............. [NO] sparse_attn....... ............[OKAY] [NO] ....... [OKAY] transformer ............ [NO] .......sparse_attn [OKAY]............ [NO] ....... stochastic_transformer[OKAY] .transformer [NO]............ .......[NO] [OKAY]....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ....... [NO]............... [NO] ....... [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer quantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version .................... torch version1.8.1 .................... torch cuda version1.8.1 ............... torch cuda version11.1 ............... nvcc version11.1 ..................... nvcc version11.2 .....................deepspeed install path 11.2........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']................... deepspeed info0.4.2+bc17042, bc17042, big-science ...................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science ...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.cpu_adam --------------------------------------------------............... JIT compiled ops requires ninja[YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ............... 11.1 nvcc version ..................... 11.2 torch version .................... 1.8.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... DeepSpeed general environment info:['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. torch install path...... torch 1.8, cuda 11.1............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 fused_adam ............. [NO] ....... [OKAY] torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science fused_lamb ............. [NO] ....... [OKAY] deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- JIT compiled ops requires ninja transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1 torch version ....................torch cuda version 1.8.1............... 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] ninja .................. [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed install path................... ........... 0.4.2+bc17042, bc17042, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. deepspeed info...... torch 1.8, cuda 11.1 ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ninja .................. [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] op name ................ installed .. compatible deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- -------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 cpu_adam ............... [YES] ...... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY]-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY]-------------------------------------------------- fused_lamb DeepSpeed C++/CUDA extension op report............. [NO]-------------------------------------------------- .......NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. [OKAY] -------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- async_io ............... [NO] ....... [NO] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 DeepSpeed general environment info:torch cuda version ............... 11.1 nvcc versiontorch install path ..................... ...............11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed info ...................torch version ....................0.4.2+bc17042, bc17042, big-science 1.8.1deepspeed wheel compiled w. ......torch cuda version torch 1.8, cuda 11.1............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer .............. quantizer[NO] .............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO]............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] .......utils [OKAY].................. [YES] ...... [OKAY] utils .................. quantizer[YES] .................... [NO][OKAY] ....... [OKAY] quantizer .............. --------------------------------------------------[NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system stochastic_transformer . [NO] ....... [OKAY] meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO] ...................... [NO][NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninja .................................... [OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------op name op name................ installed................ .. installedcompatible ..-------------------------------------------------- compatible -------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... [OKAY]............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adam ............. fused_lamb[NO] .................... [NO] [OKAY]....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY]sparse_attn transformer............ ............[NO] [NO] .............. [OKAY][OKAY] transformerstochastic_transformer ............ .[NO] [NO]....... ....... [OKAY][OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. stochastic_transformer . [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] DeepSpeed C++/CUDA extension op report fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- JIT compiled ops requires ninja async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- JIT compiled ops requires ninja cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] op name ................ installed .. compatible -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found ninja .................. [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. op name ................ installed .. compatible -------------------------------------------------- async_io ............... [NO] ....... [NO] cpu_adam ............... [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] DeepSpeed general environment info: sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 stochastic_transformer . [NO] ....... [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] cpu_adam ...............-------------------------------------------------- [YES]op name ...................... [OKAY]installed .. compatible -------------------------------------------------- fused_adam ............. cpu_adam[NO] ............... .......[YES] [OKAY]...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attnfused_lamb ......................... [NO][NO] .............. [OKAY] [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ stochastic_transformer[NO] ....... .[OKAY] [NO] ....... [OKAY]transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system transformer ............ [NO] ....... [OKAY] meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] DeepSpeed general environment info: fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] torch version .................... 1.8.1 stochastic_transformer . [NO] ....... [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: -------------------------------------------------- DeepSpeed C++/CUDA extension op report torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- torch version .................... 1.8.1 -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja torch cuda version ............... 11.1 nvcc version ..................... 11.2 ninja .................. [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system fused_lamb ............. [NO] ....... [OKAY] meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] DeepSpeed general environment info: -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- torch install path DeepSpeed general environment info:............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch install path cpu_adam ............... [YES] ...... [OKAY] torch install path...............torch version ................................... 1.8.1 torch cuda version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ............... fused_adam ............. [NO] ....... [OKAY] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']11.1 torch version nvcc versiontorch version.................... .........................................1.8.1 11.21.8.1 torch cuda versiondeepspeed install path torch cuda version.......................... ...............11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.1nvcc version fused_lamb ............. [NO] ....... [OKAY] deepspeed infonvcc version..................... ........................................11.2 0.4.2+bc17042, bc17042, big-science11.2deepspeed install path deepspeed wheel compiled w.deepspeed install path........... ................. ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. sparse_attn ............ [NO] ....... [OKAY] ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] ninja .................. [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: ninja .................. [OKAY] -------------------------------------------------- torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 op name ................ installed .. compatible torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 -------------------------------------------------- deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science cpu_adam ............... [YES] ...... [OKAY] deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] DeepSpeed general environment info: transformer ............ [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] stochastic_transformer . [NO] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY]ninja .................. fused_lamb[OKAY] ............. --------------------------------------------------[NO] .......op name [OKAY]................ installed .. compatible -------------------------------------------------- sparse_attn ............cpu_adam [NO]............... .......[YES] [OKAY]...... [OKAY]transformer ............ [NO] ....... [OKAY] stochastic_transformerfused_adam .............. [NO][NO] .............. [OKAY][OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_ioasync_io .............................. [NO][NO] .............. [NO][NO] torch version .................... 1.8.1 transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 utilsutils .................................... [YES] [YES]...... ......[OKAY] [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] ninja .................. [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- utils .................. [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] async_io ............... [NO] ....... [NO] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninja .................................... [OKAY][OKAY] ninja---------------------------------------------------------------------------------------------------- ..................op nameop name [OKAY]................ ................installed-------------------------------------------------- installed..op name compatible.................. compatibleinstalled-------------------------------------------------- .. --------------------------------------------------compatible -------------------------------------------------- cpu_adam ............... [YES]cpu_adam ......cpu_adam............... [OKAY]...............[YES] ......[YES] [OKAY]...... [OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY]fused_adam .......................... [NO]fused_lamb[NO] ........................... [OKAY][OKAY][NO] .......fused_lamb fused_lamb [OKAY] ............. ............. [NO][NO] .............. [OKAY][OKAY] sparse_attn ............ [NO] .......ninja [OKAY] sparse_attn..................sparse_attn transformer ............ [OKAY]............ ............ [NO] [NO] --------------------------------------------------[NO].............. op name[OKAY].......[OKAY] ................[OKAY] transformerinstalledtransformer .. ............ ............ stochastic_transformer[NO]compatible [NO]--------------------------------------------------....... [OKAY]........ [NO][OKAY] .......cpu_adamstochastic_transformer stochastic_transformer...............[OKAY] . [YES].[NO] ...... [NO].......[OKAY] ....... [OKAY] [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ...............utils [NO].................. .......[YES] [NO]...... [OKAY] quantizer .............. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY]-------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. quantizer .............. [NO] ....... [OKAY] async_io ............... [NO] async_io....... [NO] -------------------------------------------------- ............... [NO] ....... [NO]transformer_inference .. [NO] ....... [OKAY] utilstransformer_inference .................... [YES][NO] ............. [OKAY][OKAY] quantizer .............. utils[NO] ......................... [YES][OKAY] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1 torch cuda versiontorch version ................................... 11.11.8.1 nvcc version torch cuda version..................... ...............11.2 11.1deepspeed install path nvcc version........... ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 11.2deepspeed info deepspeed install path................... ...........0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...... deepspeed infotorch 1.8, cuda 11.1 ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io ............... [NO] ....... [NO]async_io DeepSpeed general environment info:torch install path ............... [NO] ....... [NO] ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 1.8.1 transformer_inference .. [NO] ....... [OKAY] torch versiontorch cuda version ................................... 1.8.111.1 nvcc version torch cuda version..................... ............... 11.211.1 transformer_inference ..utils [NO].................. .......[YES] [OKAY]...... [OKAY] deepspeed install pathnvcc version ................................ 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install pathdeepspeed info .............................. 0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] quantizer utils.............. ..................[NO] [YES]....... [OKAY]...... deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.async_io ............... [NO] ....... [NO] transformer_inferenceasync_io .. ...............[NO] [NO]....... .......[OKAY] [NO] utils .................. [YES] ...... [OKAY] transformer_inference .. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] --------------------------------------------------utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ....... ............... [NO] ....... [NO] [NO] transformer_inferencetransformer_inference .... [NO] ....... [OKAY] [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. quantizer[YES] .................... [NO][OKAY] ....... [OKAY] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... async_io[NO] ............... [NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 -------------------------------------------------- torch cuda version ............... 11.1 nvcc version ..................... 11.2 DeepSpeed C++/CUDA extension op report deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer ............ [NO] ....... [OKAY] async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] ....... .......[OKAY] stochastic_transformer . [NO] ....... [OKAY] [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] fused_lamb ............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY]ninja utils .................. [YES] ...... [OKAY] transformer.................. ............ [NO][OKAY] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- op name stochastic_transformer................ installed. ..[NO] .......compatible [OKAY] -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] ....... transformer_inference[OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. .. [NO] ....... utils[OKAY] async_io ...............async_io [NO] ...................... [NO][NO] .................. [YES] ...... [OKAY] utils ..................quantizer [YES].............. ......[NO] [OKAY]....... [OKAY] ....... [NO] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] ...... utils[OKAY] .................. [YES]quantizer .................... [OKAY][NO] ....... [OKAY] quantizer .............. --------------------------------------------------[NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninja .................. [OKAY] fused_adam-------------------------------------------------- ............. op name[NO] ................ .......ninjainstalled ..[OKAY].................. compatible[OKAY] fused_lamb -------------------------------------------------- .............-------------------------------------------------- [NO] op name....... ................[OKAY] cpu_adaminstalled ................. [YES]compatible ...... --------------------------------------------------[OKAY] sparse_attn ............ [NO] ....... [OKAY] transformercpu_adam fused_adam ............ ............... ............. [NO] [YES] [NO] ....... ...... ....... [OKAY] [OKAY] [OKAY] fused_lambstochastic_transformer ............. [NO] ........ [NO]fused_adam[OKAY] .................... [NO][OKAY] ....... [OKAY] DeepSpeed general environment info: fused_lamb ............. sparse_attn[NO] ............ .......[NO] .......[OKAY] [OKAY] transformer ............ [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] sparse_attnstochastic_transformer ............ .[NO] [NO]....... .......[OKAY] [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 transformer ............ [NO] ....... [OKAY] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] stochastic_transformer . [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 utils .................. [YES] ...... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] DeepSpeed general environment info: -------------------------------------------------- torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version .................... 1.8.1 .................... 1.8.1torch cuda version torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ............... torch cuda version11.1 ............... nvcc version11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 ..................... nvcc version11.2 ..................... deepspeed install path11.2 nvcc version ..................... 11.2 ........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ................... 0.4.2+bc17042, bc17042, big-science deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] .......transformer_inference [NO].. [NO] ....... [OKAY] transformer_inferenceutils .................... [YES][NO] ............. [OKAY][OKAY] quantizer ..............utils [NO].................. .......[YES] [OKAY]...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 torch version .................... 1.8.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch cuda version ............... 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info: deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 -------------------------------------------------- DeepSpeed C++/CUDA extension op report deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] fused_adam ............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1torch install path torch cuda version .............................. 11.1 nvcc version ..................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']11.2 deepspeed install path torch version........... .................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']1.8.1 deepspeed info torch cuda version................... ...............0.4.2+bc17042, bc17042, big-science 11.1deepspeed wheel compiled w. nvcc version...... .....................torch 1.8, cuda 11.1 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io ............... [NO] ....... [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference .. [NO] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 utils .................. [YES] ...... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1torch version ....................torch cuda version 1.8.1............... 11.1torch cuda version nvcc version............... .....................11.1 11.2 nvcc version deepspeed install path..................... ...........11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path ...........deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-sciencedeepspeed info deepspeed wheel compiled w.................... ......0.4.2+bc17042, bc17042, big-science torch 1.8, cuda 11.1deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ninja .................. [OKAY] -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] DeepSpeed general environment info: sparse_attn ............ [NO] ....... [OKAY] DeepSpeed general environment info: -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer ............ [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info:torch version op name ................ installed .. compatible torch version .................... 1.8.1 .................... 1.8.1 -------------------------------------------------- torch cuda version ............... 11.1 nvcc version ..................... 11.2 stochastic_transformer . [NO] ....... [OKAY] torch cuda versiontorch install path ............... ...............11.1 nvcc version ..................... 11.2 cpu_adam ............... [YES] ...... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed install path ........... torch version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] fused_adam ............. [NO] ....... [OKAY] .................... 1.8.1 torch cuda version ............... 11.1deepspeed info fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] nvcc version................... ..................... 11.2 deepspeed install path 0.4.2+bc17042, bc17042, big-science........... deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science...... transformer ............ [NO] ....... [OKAY] deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninjacpu_adam ................................. [OKAY][YES] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: ......-------------------------------------------------- [OKAY]op name torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 ................ installed .. compatible torch cuda version ............... 11.1 -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] nvcc version ..................... 11.2 cpu_adam fused_lamb............... [YES]............. ...... [NO][OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 fused_adam ............. sparse_attn[NO] ................... [OKAY][NO] ....... [OKAY]fused_lamb ............. [NO]transformer ....... ............[OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] stochastic_transformer . [NO] sparse_attn....... ............[OKAY] [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 async_io ............... [NO] ....... [NO] torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] transformer_inference .. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science utils .................. [YES] ...... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch install pathtorch cuda version .............................. 11.1 nvcc version ..................... 11.2['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed install path ...........torch version ....................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 1.8.1deepspeed info ...................torch cuda version 0.4.2+bc17042, bc17042, big-science............... deepspeed wheel compiled w.11.1 ......nvcc version torch 1.8, cuda 11.1..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. JIT compiled ops requires ninja async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference utils.. [NO] ....... [OKAY] .................. [YES] ...... utils[OKAY] .................. [YES] ......quantizer [OKAY].............. [NO] ....... [OKAY]quantizer .............. --------------------------------------------------[NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ninja .................. [OKAY] async_io ............... [NO] ....... [NO] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO]utils ......................... [OKAY][YES] ...... [OKAY] utils ..................quantizer [YES].............. ......[NO] [OKAY]....... [OKAY] quantizer-------------------------------------------------- .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES]ninja ...... ..................[OKAY] [OKAY] -------------------------------------------------- op name ................ installed ..fused_adam compatible............. --------------------------------------------------[NO] ....... [OKAY] cpu_adamfused_lamb ............... .............[YES] [NO]...... .......[OKAY] [OKAY] fused_adamsparse_attn ......................... [NO] [NO]....... .......[OKAY] [OKAY] fused_lambtransformer ......................... [NO] [NO]....... .......[OKAY] [OKAY] stochastic_transformer . [NO] ....... sparse_attn ............ [OKAY][NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ....... ...............[NO] [NO] ....... [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... [OKAY]quantizer .............. [NO] .......quantizer [OKAY].............. [NO] .......-------------------------------------------------- [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ......quantizer [OKAY].............. [NO] .......quantizer [OKAY].............. [NO] .......-------------------------------------------------- [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] ninja-------------------------------------------------- .................. op name[OKAY] ................ installed-------------------------------------------------- .. op namecompatible ................-------------------------------------------------- installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] .......fused_adam [OKAY]............. [NO] ....... fused_lamb[OKAY] ............. [NO] fused_lamb....... .............[OKAY] [NO] ....... [OKAY] sparse_attn ............ [NO] sparse_attn....... ............[OKAY] [NO] .......transformer [OKAY]............ [NO] ....... transformer[OKAY] ............ [NO] ....... [OKAY]stochastic_transformer .stochastic_transformer [NO] ........ [OKAY][NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] DeepSpeed general environment info: sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] stochastic_transformer . [NO] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY][0m -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 async_io ............... [NO] ....... [NO] nvcc versionnvcc version .......................................... 11.211.2 transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science quantizer .............. [NO] ....... [OKAY] deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found using world size: 256, data-parallel-size: 8, tensor-model-parallel size: 4, pipeline-model-parallel size: 8 using torch.float16 for parameters ... ------------------------ arguments ------------------------ **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] accumulate_allreduce_grads_in_fp32 .............. False transformer ............ [NO] ....... [OKAY] adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True stochastic_transformer . [NO] ....... [OKAY] apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/codecarbon consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False data_impl ....................................... mmap data_parallel_size .............................. 8 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1164492.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 5 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 20480 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False global_batch_size ............................... 2048 hidden_dropout .................................. 0.1 hidden_size ..................................... 16384 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 512 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 1 log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ 126953125 lr_decay_style .................................. cosine lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 216320 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 32 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 32 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 8 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['16', '16', '6_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 42 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 300000000 use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 256 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 16 to global batch size 2048 with batch size increments 16 over 6000000 samples. > building GPT2BPETokenizer tokenizer ...  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] DeepSpeed general environment info: -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info:DeepSpeed general environment info: async_io ............... [NO] ....... [NO] torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference .. [NO] ....... [OKAY] torch versiontorch version ........................................ 1.8.11.8.1 utils .................. [YES] ...... [OKAY] torch cuda versiontorch cuda version .............................. 11.111.1 quantizer .............. [NO] ....... [OKAY] nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... [NO]............... [NO] ....... [NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. /bin/sh: line 0: type: git: not found [NO] ....... [OKAY]utils .................. [YES] ......utils [OKAY].................. [YES] ...... quantizer[OKAY] .............. [NO] quantizer....... ..............[OKAY] [NO] ....... --------------------------------------------------[OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > setting tensorboard ...  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_adam ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found async_io ............... [NO] ....... [NO] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_adam ............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] fused_lamb ............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path torch version............... .................... 1.8.1 torch cuda version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... 11.1 nvcc versiontorch version ......................................... 11.21.8.1 deepspeed install path torch cuda version........... ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.1 deepspeed infonvcc version ........................................ 0.4.2+bc17042, bc17042, big-science11.2 deepspeed wheel compiled w.deepspeed install path ................. torch 1.8, cuda 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO]utils ......................... [OKAY][YES] ...... [OKAY] quantizerutils ................................ [NO][YES] ............. [OKAY][OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1 torch cuda version ...............torch cuda version 11.1............... nvcc version 11.1..................... nvcc version11.2 .....................deepspeed install path 11.2........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']................... deepspeed info0.4.2+bc17042, bc17042, big-science ...................deepspeed wheel compiled w. ...... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info:DeepSpeed general environment info: torch version .................... 1.8.1 torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version ............... 11.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science DeepSpeed general environment info: deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 DeepSpeed general environment info: torch cuda version ............... 11.1 nvcc version ..................... 11.2 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1 torch version ....................torch cuda version 1.8.1............... 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja ninja.................. [OKAY].................. [OKAY]-------------------------------------------------- --------------------------------------------------op name ................op name installed................ .. installedcompatible .. --------------------------------------------------compatible -------------------------------------------------- cpu_adam ...............cpu_adam [YES] ..................... [YES][OKAY] ...... [OKAY] ninja .................. [OKAY] fused_adam-------------------------------------------------- fused_adam.............op name [NO]............................. installed.......[NO] [OKAY]......... compatible[OKAY] fused_lamb --------------------------------------------------............. [NO]fused_lamb .................... [OKAY][NO] cpu_adam.......ninja ............... [OKAY]..................[YES] [OKAY]......sparse_attn [OKAY] ............ --------------------------------------------------[NO] .......op name sparse_attn[OKAY]................ fused_adam installed......................... ..transformer[NO][NO] .......................... [OKAY][NO][OKAY]compatible .......transformer --------------------------------------------------[OKAY]fused_lamb............ .............[NO] stochastic_transformer[NO]....... .......[OKAY] . [OKAY] [NO]cpu_adamstochastic_transformer ...................... . [OKAY] sparse_attn[YES] [NO] .................. [NO][OKAY]....... [OKAY] ....... [OKAY] transformer ............ [NO] ....... [OKAY] fused_adamstochastic_transformer .............. [NO][NO] .............. [OKAY] [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ...............DeepSpeed general environment info: 11.1 nvcc version ..................... 11.2 deepspeed install pathtorch install path ........... ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed wheel compiled w. ...... torch versiontorch 1.8, cuda 11.1 .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info: deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja ..................ninja [OKAY] ..................-------------------------------------------------- [OKAY]op name ................-------------------------------------------------- installed ..op name compatible ................-------------------------------------------------- installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... cpu_adam[OKAY] ............... [YES] ...... [OKAY] DeepSpeed general environment info: fused_adam ............. [NO] ....... [OKAY]fused_adam ............. [NO]fused_lamb .................... [NO] [OKAY]....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] fused_lamb ............. [NO] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 sparse_attn ............ [NO] ....... [OKAY] nvcc version ..................... 11.2 sparse_attn ............transformer [NO]............ .......[NO] .......[OKAY] [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] transformer ............stochastic_transformer [NO] ........ [NO][OKAY] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science stochastic_transformer . [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1torch cuda version ...............torch cuda version 11.1............... nvcc version11.1 ..................... nvcc version11.2 ..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... [NO]............... .......[NO] [NO]....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. cpu_adam ............... [YES] ...... [OKAY] DeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] ninja-------------------------------------------------- ..................op name [OKAY]................ --------------------------------------------------installed ..op name compatible................ installed-------------------------------------------------- .. compatible -------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... ...............[OKAY] [YES] ...... [OKAY] fused_adam ............. [NO]fused_adam .................... [OKAY][NO] ....... fused_lamb[OKAY] ............. [NO]fused_lamb .................... [OKAY][NO] ....... [OKAY] sparse_attn ............ [NO]sparse_attn ................... [OKAY][NO] ....... [OKAY]transformer ............ transformer[NO] ................... [NO][OKAY] ....... [OKAY] stochastic_transformer stochastic_transformer . [NO]. [NO]....... ....... [OKAY][OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] ninjatransformer .............................. [OKAY][NO] --------------------------------------------------....... [OKAY] op name ................ installed stochastic_transformer.. compatible. --------------------------------------------------[NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1 torch cuda versiontorch version ................................... 11.11.8.1 nvcc version .....................torch cuda version 11.2............... deepspeed install path11.1 ...........nvcc version .....................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 11.2deepspeed info ...................deepspeed install path 0.4.2+bc17042, bc17042, big-science........... deepspeed wheel compiled w. ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...... deepspeed infotorch 1.8, cuda 11.1 ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.DeepSpeed general environment info: ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. **** Git info for Megatron: git_hash=unknown git_branch=unknown **** async_io ............... [NO] ....... [NO] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] .......utils [OKAY].................. [YES] ...... [OKAY] utils .................. [YES]quantizer .................... [OKAY][NO] ....... [OKAY] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1torch cuda version ...............torch cuda version 11.1............... 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ...... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 DeepSpeed general environment info: deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................torch install path 1.8.1 ...............torch cuda version ............... 11.1 nvcc version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']..................... 11.2 torch versiondeepspeed install path ............................... 1.8.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch cuda versiondeepspeed info .................................. 11.10.4.2+bc17042, bc17042, big-science nvcc versiondeepspeed wheel compiled w. ........................... 11.2torch 1.8, cuda 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] async_io ............... [NO] ....... [NO] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... DeepSpeed general environment info:1.8.1 torch cuda version ............... 11.1 torch install pathnvcc version ..................... ...............11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed info ................... torch version0.4.2+bc17042, bc17042, big-science ....................deepspeed wheel compiled w. 1.8.1...... torch 1.8, cuda 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. /bin/sh: line 0: type: git: not found async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info:DeepSpeed general environment info: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found > setting codecarbon ... **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... ninja[YES] ...... ..................[OKAY] [OKAY] -------------------------------------------------- op name ................ installed .. fused_adamcompatible ............. --------------------------------------------------[NO] ....... [OKAY] fused_lamb ............. cpu_adam[NO] ...................... [OKAY][YES] ...... [OKAY] sparse_attnfused_adam ......................... [NO][NO] ....... .......[OKAY] [OKAY] transformer ............fused_lamb [NO]............. ....... [NO][OKAY] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .. [NO] ......... [NO] [OKAY]....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ......quantizer [OKAY].............. [NO] ....... quantizer[OKAY] .............. --------------------------------------------------[NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] utils....... ..................[OKAY] [YES] ...... [OKAY] utils quantizer.................. ..............[YES] [NO]...... .......[OKAY] [OKAY] quantizer-------------------------------------------------- .............. [NO] ....... [OKAY] -------------------------------------------------- > initializing torch distributed ... DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] fused_adamfused_adam .......................... [NO][NO] .............. [OKAY][OKAY] fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] transformertransformer ........................ [NO][NO] .............. [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils ..................transformer_inference [YES].. ......[NO] [OKAY]....... [OKAY] quantizer .............. [NO] utils....... [OKAY].................. [YES] ...... --------------------------------------------------[OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ....... ...............[NO] [NO] ....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... async_io[NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... utils[OKAY] .................. [YES] ...... [OKAY] utils .................. [YES]quantizer .................... [OKAY][NO] ....... [OKAY] quantizer .............. --------------------------------------------------[NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'].................... 1.8.1 torch version torch cuda version.................... 1.8.1............... 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op nameninja ................ ..................installed [OKAY].. compatible -------------------------------------------------- -------------------------------------------------- ninjaop name .................................. installed[OKAY]cpu_adam ................. -------------------------------------------------- ninjacompatible[YES]op name .................. ................ -------------------------------------------------- ......[OKAY]installed --------------------------------------------------[OKAY].. op namecompatible cpu_adam ................ -------------------------------------------------- ...............installed fused_adam..[YES] compatible................... cpu_adam -------------------------------------------------- [OKAY][NO] ............... [YES]....... ...... [OKAY]cpu_adam[OKAY] fused_adam............... [YES]fused_lamb ............. ................... [OKAY][NO][NO] fused_adam ........................... [OKAY][OKAY][NO] fused_adam....... .............fused_lamb[OKAY] [NO] ............. ....... [NO][OKAY]fused_lamb sparse_attn fused_lamb............. ....... ............[NO] ............. [NO] [OKAY].......[NO] ....... .......[OKAY][OKAY] [OKAY] transformersparse_attn ........................ [NO][NO] .......sparse_attn ....... [OKAY]sparse_attn ............ [OKAY] ............ [NO] [NO]....... stochastic_transformer .......transformer[OKAY] [OKAY]............. transformer[NO][NO] transformer ................... ....... ............[NO] [OKAY] [OKAY]....... [NO] [OKAY]....... [OKAY]stochastic_transformer stochastic_transformer .stochastic_transformer. [NO].[NO] .......[NO] ....... [OKAY] ....... [OKAY][OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... [NO]............... .......[NO] .......[NO] [NO] transformer_inferencetransformer_inference .. ..[NO] [NO]....... .......[OKAY] [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] ...... quantizer[OKAY] .............. [NO] quantizer....... ..............[OKAY] [NO] .......-------------------------------------------------- [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................. [OKAY][OKAY] .................. [OKAY] -------------------------------------------------- --------------------------------------------------[OKAY]--------------------------------------------------op name op name................-------------------------------------------------- ................op nameinstalled op name installed .................................... compatibleinstalledcompatibleinstalled .. -------------------------------------------------- .. compatiblecompatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY]cpu_adamcpu_adam cpu_adam .............................. [YES][YES] ............... ...... ...... [YES] [OKAY] [OKAY]fused_adam ................... [OKAY][NO] ....... [OKAY] fused_adamfused_adamfused_lamb ....................................... [NO][NO][NO] ..................... fused_adam [OKAY][OKAY] [OKAY] .............fused_lambfused_lamb [NO].......................... [NO][NO] sparse_attn ................................. [OKAY][OKAY][OKAY][NO] ....... [OKAY] fused_lamb ............. transformer[NO] ................... [OKAY]sparse_attn[NO] sparse_attn ............................... [NO][OKAY][NO] .............. [OKAY][OKAY] stochastic_transformersparse_attn transformertransformer . ........................ [NO]............ [NO]....... [NO] ....... [NO][OKAY] ....... [OKAY] [OKAY] stochastic_transformerstochastic_transformer ....... ..[OKAY] [NO][NO] ....... .......[OKAY] transformer [OKAY] ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... [NO] ....... [NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] transformer_inference ..quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... async_io[NO] ............... [NO] ....... [NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... [OKAY] utils .................. [YES] ...... utils[OKAY] .................. [YES] quantizer...... ..............[OKAY] [NO] ....... quantizer[OKAY] .............. [NO] --------------------------------------------------....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch install path.................... 1.8.1............... torch cuda version ............... 11.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']nvcc version ..................... 11.2torch version ....................deepspeed install path 1.8.1........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch cuda version deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 8 > setting random seeds to 42 ... [2021-09-24 05:52:24,592] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' >>> done with dataset index builder. Compilation time: 0.299 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 19.795 seconds time to initialize megatron (seconds): 12.702 [after megatron is initialized] datetime: 2021-09-24 05:52:44 building GPT model ... [2021-09-24 05:52:44,769] [INFO] [utils.py:680:see_memory_usage] Before Building Model /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved warnings.warn( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved warnings.warn( [2021-09-24 05:52:44,770] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-09-24 05:52:44,770] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 37.83 GB, percent = 20.2% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=0, data=4, model=0): 16, ProcessCoord(pipe=0, data=4, model=1): 17, ProcessCoord(pipe=0, data=4, model=2): 18, ProcessCoord(pipe=0, data=4, model=3): 19, ProcessCoord(pipe=0, data=5, model=0): 20, ProcessCoord(pipe=0, data=5, model=1): 21, ProcessCoord(pipe=0, data=5, model=2): 22, ProcessCoord(pipe=0, data=5, model=3): 23, ProcessCoord(pipe=0, data=6, model=0): 24, ProcessCoord(pipe=0, data=6, model=1): 25, ProcessCoord(pipe=0, data=6, model=2): 26, ProcessCoord(pipe=0, data=6, model=3): 27, ProcessCoord(pipe=0, data=7, model=0): 28, ProcessCoord(pipe=0, data=7, model=1): 29, ProcessCoord(pipe=0, data=7, model=2): 30, ProcessCoord(pipe=0, data=7, model=3): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=0, model=1): 33, ProcessCoord(pipe=1, data=0, model=2): 34, ProcessCoord(pipe=1, data=0, model=3): 35, ProcessCoord(pipe=1, data=1, model=0): 36, ProcessCoord(pipe=1, data=1, model=1): 37, ProcessCoord(pipe=1, data=1, model=2): 38, ProcessCoord(pipe=1, data=1, model=3): 39, ProcessCoord(pipe=1, data=2, model=0): 40, ProcessCoord(pipe=1, data=2, model=1): 41, ProcessCoord(pipe=1, data=2, model=2): 42, ProcessCoord(pipe=1, data=2, model=3): 43, ProcessCoord(pipe=1, data=3, model=0): 44, ProcessCoord(pipe=1, data=3, model=1): 45, ProcessCoord(pipe=1, data=3, model=2): 46, ProcessCoord(pipe=1, data=3, model=3): 47, ProcessCoord(pipe=1, data=4, model=0): 48, ProcessCoord(pipe=1, data=4, model=1): 49, ProcessCoord(pipe=1, data=4, model=2): 50, ProcessCoord(pipe=1, data=4, model=3): 51, ProcessCoord(pipe=1, data=5, model=0): 52, ProcessCoord(pipe=1, data=5, model=1): 53, ProcessCoord(pipe=1, data=5, model=2): 54, ProcessCoord(pipe=1, data=5, model=3): 55, ProcessCoord(pipe=1, data=6, model=0): 56, ProcessCoord(pipe=1, data=6, model=1): 57, ProcessCoord(pipe=1, data=6, model=2): 58, ProcessCoord(pipe=1, data=6, model=3): 59, ProcessCoord(pipe=1, data=7, model=0): 60, ProcessCoord(pipe=1, data=7, model=1): 61, ProcessCoord(pipe=1, data=7, model=2): 62, ProcessCoord(pipe=1, data=7, model=3): 63, ProcessCoord(pipe=2, data=0, model=0): 64, ProcessCoord(pipe=2, data=0, model=1): 65, ProcessCoord(pipe=2, data=0, model=2): 66, ProcessCoord(pipe=2, data=0, model=3): 67, ProcessCoord(pipe=2, data=1, model=0): 68, ProcessCoord(pipe=2, data=1, model=1): 69, ProcessCoord(pipe=2, data=1, model=2): 70, ProcessCoord(pipe=2, data=1, model=3): 71, ProcessCoord(pipe=2, data=2, model=0): 72, ProcessCoord(pipe=2, data=2, model=1): 73, ProcessCoord(pipe=2, data=2, model=2): 74, ProcessCoord(pipe=2, data=2, model=3): 75, ProcessCoord(pipe=2, data=3, model=0): 76, ProcessCoord(pipe=2, data=3, model=1): 77, ProcessCoord(pipe=2, data=3, model=2): 78, ProcessCoord(pipe=2, data=3, model=3): 79, ProcessCoord(pipe=2, data=4, model=0): 80, ProcessCoord(pipe=2, data=4, model=1): 81, ProcessCoord(pipe=2, data=4, model=2): 82, ProcessCoord(pipe=2, data=4, model=3): 83, ProcessCoord(pipe=2, data=5, model=0): 84, ProcessCoord(pipe=2, data=5, model=1): 85, ProcessCoord(pipe=2, data=5, model=2): 86, ProcessCoord(pipe=2, data=5, model=3): 87, ProcessCoord(pipe=2, data=6, model=0): 88, ProcessCoord(pipe=2, data=6, model=1): 89, ProcessCoord(pipe=2, data=6, model=2): 90, ProcessCoord(pipe=2, data=6, model=3): 91, ProcessCoord(pipe=2, data=7, model=0): 92, ProcessCoord(pipe=2, data=7, model=1): 93, ProcessCoord(pipe=2, data=7, model=2): 94, ProcessCoord(pipe=2, data=7, model=3): 95, ProcessCoord(pipe=3, data=0, model=0): 96, ProcessCoord(pipe=3, data=0, model=1): 97, ProcessCoord(pipe=3, data=0, model=2): 98, ProcessCoord(pipe=3, data=0, model=3): 99, ProcessCoord(pipe=3, data=1, model=0): 100, ProcessCoord(pipe=3, data=1, model=1): 101, ProcessCoord(pipe=3, data=1, model=2): 102, ProcessCoord(pipe=3, data=1, model=3): 103, ProcessCoord(pipe=3, data=2, model=0): 104, ProcessCoord(pipe=3, data=2, model=1): 105, ProcessCoord(pipe=3, data=2, model=2): 106, ProcessCoord(pipe=3, data=2, model=3): 107, ProcessCoord(pipe=3, data=3, model=0): 108, ProcessCoord(pipe=3, data=3, model=1): 109, ProcessCoord(pipe=3, data=3, model=2): 110, ProcessCoord(pipe=3, data=3, model=3): 111, ProcessCoord(pipe=3, data=4, model=0): 112, ProcessCoord(pipe=3, data=4, model=1): 113, ProcessCoord(pipe=3, data=4, model=2): 114, ProcessCoord(pipe=3, data=4, model=3): 115, ProcessCoord(pipe=3, data=5, model=0): 116, ProcessCoord(pipe=3, data=5, model=1): 117, ProcessCoord(pipe=3, data=5, model=2): 118, ProcessCoord(pipe=3, data=5, model=3): 119, ProcessCoord(pipe=3, data=6, model=0): 120, ProcessCoord(pipe=3, data=6, model=1): 121, ProcessCoord(pipe=3, data=6, model=2): 122, ProcessCoord(pipe=3, data=6, model=3): 123, ProcessCoord(pipe=3, data=7, model=0): 124, ProcessCoord(pipe=3, data=7, model=1): 125, ProcessCoord(pipe=3, data=7, model=2): 126, ProcessCoord(pipe=3, data=7, model=3): 127, ProcessCoord(pipe=4, data=0, model=0): 128, ProcessCoord(pipe=4, data=0, model=1): 129, ProcessCoord(pipe=4, data=0, model=2): 130, ProcessCoord(pipe=4, data=0, model=3): 131, ProcessCoord(pipe=4, data=1, model=0): 132, ProcessCoord(pipe=4, data=1, model=1): 133, ProcessCoord(pipe=4, data=1, model=2): 134, ProcessCoord(pipe=4, data=1, model=3): 135, ProcessCoord(pipe=4, data=2, model=0): 136, ProcessCoord(pipe=4, data=2, model=1): 137, ProcessCoord(pipe=4, data=2, model=2): 138, ProcessCoord(pipe=4, data=2, model=3): 139, ProcessCoord(pipe=4, data=3, model=0): 140, ProcessCoord(pipe=4, data=3, model=1): 141, ProcessCoord(pipe=4, data=3, model=2): 142, ProcessCoord(pipe=4, data=3, model=3): 143, ProcessCoord(pipe=4, data=4, model=0): 144, ProcessCoord(pipe=4, data=4, model=1): 145, ProcessCoord(pipe=4, data=4, model=2): 146, ProcessCoord(pipe=4, data=4, model=3): 147, ProcessCoord(pipe=4, data=5, model=0): 148, ProcessCoord(pipe=4, data=5, model=1): 149, ProcessCoord(pipe=4, data=5, model=2): 150, ProcessCoord(pipe=4, data=5, model=3): 151, ProcessCoord(pipe=4, data=6, model=0): 152, ProcessCoord(pipe=4, data=6, model=1): 153, ProcessCoord(pipe=4, data=6, model=2): 154, ProcessCoord(pipe=4, data=6, model=3): 155, ProcessCoord(pipe=4, data=7, model=0): 156, ProcessCoord(pipe=4, data=7, model=1): 157, ProcessCoord(pipe=4, data=7, model=2): 158, ProcessCoord(pipe=4, data=7, model=3): 159, ProcessCoord(pipe=5, data=0, model=0): 160, ProcessCoord(pipe=5, data=0, model=1): 161, ProcessCoord(pipe=5, data=0, model=2): 162, ProcessCoord(pipe=5, data=0, model=3): 163, ProcessCoord(pipe=5, data=1, model=0): 164, ProcessCoord(pipe=5, data=1, model=1): 165, ProcessCoord(pipe=5, data=1, model=2): 166, ProcessCoord(pipe=5, data=1, model=3): 167, ProcessCoord(pipe=5, data=2, model=0): 168, ProcessCoord(pipe=5, data=2, model=1): 169, ProcessCoord(pipe=5, data=2, model=2): 170, ProcessCoord(pipe=5, data=2, model=3): 171, ProcessCoord(pipe=5, data=3, model=0): 172, ProcessCoord(pipe=5, data=3, model=1): 173, ProcessCoord(pipe=5, data=3, model=2): 174, ProcessCoord(pipe=5, data=3, model=3): 175, ProcessCoord(pipe=5, data=4, model=0): 176, ProcessCoord(pipe=5, data=4, model=1): 177, ProcessCoord(pipe=5, data=4, model=2): 178, ProcessCoord(pipe=5, data=4, model=3): 179, ProcessCoord(pipe=5, data=5, model=0): 180, ProcessCoord(pipe=5, data=5, model=1): 181, ProcessCoord(pipe=5, data=5, model=2): 182, ProcessCoord(pipe=5, data=5, model=3): 183, ProcessCoord(pipe=5, data=6, model=0): 184, ProcessCoord(pipe=5, data=6, model=1): 185, ProcessCoord(pipe=5, data=6, model=2): 186, ProcessCoord(pipe=5, data=6, model=3): 187, ProcessCoord(pipe=5, data=7, model=0): 188, ProcessCoord(pipe=5, data=7, model=1): 189, ProcessCoord(pipe=5, data=7, model=2): 190, ProcessCoord(pipe=5, data=7, model=3): 191, ProcessCoord(pipe=6, data=0, model=0): 192, ProcessCoord(pipe=6, data=0, model=1): 193, ProcessCoord(pipe=6, data=0, model=2): 194, ProcessCoord(pipe=6, data=0, model=3): 195, ProcessCoord(pipe=6, data=1, model=0): 196, ProcessCoord(pipe=6, data=1, model=1): 197, ProcessCoord(pipe=6, data=1, model=2): 198, ProcessCoord(pipe=6, data=1, model=3): 199, ProcessCoord(pipe=6, data=2, model=0): 200, ProcessCoord(pipe=6, data=2, model=1): 201, ProcessCoord(pipe=6, data=2, model=2): 202, ProcessCoord(pipe=6, data=2, model=3): 203, ProcessCoord(pipe=6, data=3, model=0): 204, ProcessCoord(pipe=6, data=3, model=1): 205, ProcessCoord(pipe=6, data=3, model=2): 206, ProcessCoord(pipe=6, data=3, model=3): 207, ProcessCoord(pipe=6, data=4, model=0): 208, ProcessCoord(pipe=6, data=4, model=1): 209, ProcessCoord(pipe=6, data=4, model=2): 210, ProcessCoord(pipe=6, data=4, model=3): 211, ProcessCoord(pipe=6, data=5, model=0): 212, ProcessCoord(pipe=6, data=5, model=1): 213, ProcessCoord(pipe=6, data=5, model=2): 214, ProcessCoord(pipe=6, data=5, model=3): 215, ProcessCoord(pipe=6, data=6, model=0): 216, ProcessCoord(pipe=6, data=6, model=1): 217, ProcessCoord(pipe=6, data=6, model=2): 218, ProcessCoord(pipe=6, data=6, model=3): 219, ProcessCoord(pipe=6, data=7, model=0): 220, ProcessCoord(pipe=6, data=7, model=1): 221, ProcessCoord(pipe=6, data=7, model=2): 222, ProcessCoord(pipe=6, data=7, model=3): 223, ProcessCoord(pipe=7, data=0, model=0): 224, ProcessCoord(pipe=7, data=0, model=1): 225, ProcessCoord(pipe=7, data=0, model=2): 226, ProcessCoord(pipe=7, data=0, model=3): 227, ProcessCoord(pipe=7, data=1, model=0): 228, ProcessCoord(pipe=7, data=1, model=1): 229, ProcessCoord(pipe=7, data=1, model=2): 230, ProcessCoord(pipe=7, data=1, model=3): 231, ProcessCoord(pipe=7, data=2, model=0): 232, ProcessCoord(pipe=7, data=2, model=1): 233, ProcessCoord(pipe=7, data=2, model=2): 234, ProcessCoord(pipe=7, data=2, model=3): 235, ProcessCoord(pipe=7, data=3, model=0): 236, ProcessCoord(pipe=7, data=3, model=1): 237, ProcessCoord(pipe=7, data=3, model=2): 238, ProcessCoord(pipe=7, data=3, model=3): 239, ProcessCoord(pipe=7, data=4, model=0): 240, ProcessCoord(pipe=7, data=4, model=1): 241, ProcessCoord(pipe=7, data=4, model=2): 242, ProcessCoord(pipe=7, data=4, model=3): 243, ProcessCoord(pipe=7, data=5, model=0): 244, ProcessCoord(pipe=7, data=5, model=1): 245, ProcessCoord(pipe=7, data=5, model=2): 246, ProcessCoord(pipe=7, data=5, model=3): 247, ProcessCoord(pipe=7, data=6, model=0): 248, ProcessCoord(pipe=7, data=6, model=1): 249, ProcessCoord(pipe=7, data=6, model=2): 250, ProcessCoord(pipe=7, data=6, model=3): 251, ProcessCoord(pipe=7, data=7, model=0): 252, ProcessCoord(pipe=7, data=7, model=1): 253, ProcessCoord(pipe=7, data=7, model=2): 254, ProcessCoord(pipe=7, data=7, model=3): 255} [2021-09-24 05:52:46,176] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=7 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe stage=1 layers=4 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe stage=2 layers=4 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=3 layers=4 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe stage=4 layers=4 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe stage=5 layers=4 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe stage=6 layers=4 27: ParallelTransformerLayerPipe 28: ParallelTransformerLayerPipe 29: ParallelTransformerLayerPipe 30: ParallelTransformerLayerPipe stage=7 layers=8 31: ParallelTransformerLayerPipe 32: ParallelTransformerLayerPipe 33: ParallelTransformerLayerPipe 34: ParallelTransformerLayerPipe 35: 36: MixedFusedLayerNorm 37: EmbeddingPipe 38: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 1986498560 [2021-09-24 05:52:47,386] [INFO] [utils.py:680:see_memory_usage] After Building Model [2021-09-24 05:52:47,387] [INFO] [utils.py:681:see_memory_usage] MA 3.77 GB Max_MA 3.79 GB CA 3.79 GB Max_CA 4 GB [2021-09-24 05:52:47,388] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 38.02 GB, percent = 20.3% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 1986465792 setting training iterations to 159576 > learning rate decay style: cosine DeepSpeed is enabled. [2021-09-24 05:52:47,464] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+bc17042, git-hash=bc17042, git-branch=big-science [2021-09-24 05:52:47,544] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-09-24 05:52:47,544] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-09-24 05:52:47,544] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer [2021-09-24 05:52:47,545] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-09-24 05:52:47,545] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-09-24 05:52:47,545] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-09-24 05:52:47,545] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-09-24 05:52:47,545] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-09-24 05:52:47,545] [INFO] [stage2.py:108:__init__] CPU Offload: False [2021-09-24 05:52:47,545] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False [2021-09-24 05:52:52,071] [INFO] [stage2.py:419:__init__] optimizer state initialized [2021-09-24 05:52:52,071] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-09-24 05:52:52,071] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-09-24 05:52:52,071] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-09-24 05:52:52,072] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-09-24 05:52:52,072] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] amp_enabled .................. False [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] amp_params ................... False [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] disable_allgather ............ False [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] dump_state ................... False [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-09-24 05:52:52,072] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] global_rank .................. 0 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] gradient_accumulation_steps .. 256 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] gradient_clipping ............ 1.0 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4096 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] optimizer_name ............... None [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] optimizer_params ............. None [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] pld_enabled .................. False [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] pld_params ................... False [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] scheduler_name ............... None [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] scheduler_params ............. None [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] sparse_attention ............. None [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] steps_per_print .............. 2000 [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-09-24 05:52:52,073] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-09-24 05:52:52,074] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-09-24 05:52:52,074] [INFO] [config.py:904:print] train_batch_size ............. 2048 [2021-09-24 05:52:52,074] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 1 [2021-09-24 05:52:52,074] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-09-24 05:52:52,074] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-09-24 05:52:52,074] [INFO] [config.py:904:print] world_size ................... 8 [2021-09-24 05:52:52,074] [INFO] [config.py:904:print] zero_allow_untested_optimizer False [2021-09-24 05:52:52,074] [INFO] [config.py:904:print] zero_config .................. { "stage": 1, "contiguous_gradients": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-09-24 05:52:52,074] [INFO] [config.py:904:print] zero_enabled ................. True [2021-09-24 05:52:52,074] [INFO] [config.py:904:print] zero_optimization_stage ...... 1 [2021-09-24 05:52:52,074] [INFO] [config.py:906:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 2.048000e+03, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-09-24 05:52:52,074] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=256 micro_batch_size=1 [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=64 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=66 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=65 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=67 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=195 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=193 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=192 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=194 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=130 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=129 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=128 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=131 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=97 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=96 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=98 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=32 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=35 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=34 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=33 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=160 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=161 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=224 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=227 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=226 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=225 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=99 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=163 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-24 05:52:52,378] [INFO] [engine.py:134:__init__] RANK=162 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) > using checkpoint value 6e-05 for learning rate > using checkpoint value 6e-06 for minimum learning rate > using checkpoint value 216320 for warmup iterations > using checkpoint value 126953125 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 8 ZeRO state_dicts for rank 168 successfully loaded 8 ZeRO state_dicts for rank 171 successfully loaded 8 ZeRO state_dicts for rank 176 successfully loaded 8 ZeRO state_dicts for rank 88 successfully loaded 8 ZeRO state_dicts for rank 170 successfully loaded 8 ZeRO state_dicts for rank 132 successfully loaded 8 ZeRO state_dicts for rank 156 successfully loaded 8 ZeRO state_dicts for rank 169 successfully loaded 8 ZeRO state_dicts for rank 159 successfully loaded 8 ZeRO state_dicts for rank 124 successfully loaded 8 ZeRO state_dicts for rank 32 successfully loaded 8 ZeRO state_dicts for rank 49 successfully loaded 8 ZeRO state_dicts for rank 96 successfully loaded 8 ZeRO state_dicts for rank 167 successfully loaded 8 ZeRO state_dicts for rank 127 successfully loaded 8 ZeRO state_dicts for rank 60 successfully loaded 8 ZeRO state_dicts for rank 148 successfully loaded 8 ZeRO state_dicts for rank 48 successfully loaded 8 ZeRO state_dicts for rank 99 successfully loaded 8 ZeRO state_dicts for rank 140 successfully loaded 8 ZeRO state_dicts for rank 144 successfully loaded 8 ZeRO state_dicts for rank 104 successfully loaded 8 ZeRO state_dicts for rank 112 successfully loaded 8 ZeRO state_dicts for rank 68 successfully loaded 8 ZeRO state_dicts for rank 120 loading 8 zero partition checkpoints for rank 168 successfully loaded 8 ZeRO state_dicts for rank 193 successfully loaded 8 ZeRO state_dicts for rank 210 successfully loaded 8 ZeRO state_dicts for rank 69 successfully loaded 8 ZeRO state_dicts for rank 52 successfully loaded 8 ZeRO state_dicts for rank 157 successfully loaded 8 ZeRO state_dicts for rank 40 successfully loaded 8 ZeRO state_dicts for rank 129 successfully loaded 8 ZeRO state_dicts for rank 201 successfully loaded 8 ZeRO state_dicts for rank 209 successfully loaded 8 ZeRO state_dicts for rank 145 successfully loaded 8 ZeRO state_dicts for rank 111 successfully loaded 8 ZeRO state_dicts for rank 211 successfully loaded 8 ZeRO state_dicts for rank 135 successfully loaded 8 ZeRO state_dicts for rank 141 successfully loaded 8 ZeRO state_dicts for rank 139 successfully loaded 8 ZeRO state_dicts for rank 172 successfully loaded 8 ZeRO state_dicts for rank 80 successfully loaded 8 ZeRO state_dicts for rank 215 successfully loaded 8 ZeRO state_dicts for rank 106 successfully loaded 8 ZeRO state_dicts for rank 187 successfully loaded 8 ZeRO state_dicts for rank 137 successfully loaded 8 ZeRO state_dicts for rank 133 successfully loaded 8 ZeRO state_dicts for rank 90 successfully loaded 8 ZeRO state_dicts for rank 74 successfully loaded 8 ZeRO state_dicts for rank 34 successfully loaded 8 ZeRO state_dicts for rank 143 successfully loaded 8 ZeRO state_dicts for rank 200 successfully loaded 8 ZeRO state_dicts for rank 122 successfully loaded 8 ZeRO state_dicts for rank 125 successfully loaded 8 ZeRO state_dicts for rank 228 successfully loaded 8 ZeRO state_dicts for rank 81 successfully loaded 8 ZeRO state_dicts for rank 105 successfully loaded 8 ZeRO state_dicts for rank 163 successfully loaded 8 ZeRO state_dicts for rank 64 successfully loaded 8 ZeRO state_dicts for rank 186 successfully loaded 8 ZeRO state_dicts for rank 97 successfully loaded 8 ZeRO state_dicts for rank 70 successfully loaded 8 ZeRO state_dicts for rank 51 successfully loaded 8 ZeRO state_dicts for rank 77 successfully loaded 8 ZeRO state_dicts for rank 160 successfully loaded 8 ZeRO state_dicts for rank 50 successfully loaded 8 ZeRO state_dicts for rank 202 successfully loaded 8 ZeRO state_dicts for rank 98 successfully loaded 8 ZeRO state_dicts for rank 20 successfully loaded 8 ZeRO state_dicts for rank 85 successfully loaded 8 ZeRO state_dicts for rank 89 successfully loaded 8 ZeRO state_dicts for rank 214 successfully loaded 8 ZeRO state_dicts for rank 114 successfully loaded 8 ZeRO state_dicts for rank 149 successfully loaded 8 ZeRO state_dicts for rank 123 successfully loaded 8 ZeRO state_dicts for rank 71 successfully loaded 8 ZeRO state_dicts for rank 126 successfully loaded 8 ZeRO state_dicts for rank 152 successfully loaded 8 ZeRO state_dicts for rank 203 successfully loaded 8 ZeRO state_dicts for rank 166 successfully loaded 8 ZeRO state_dicts for rank 41 successfully loaded 8 ZeRO state_dicts for rank 222 successfully loaded 8 ZeRO state_dicts for rank 130 successfully loaded 8 ZeRO state_dicts for rank 216 successfully loaded 8 ZeRO state_dicts for rank 84 successfully loaded 8 ZeRO state_dicts for rank 100 successfully loaded 8 ZeRO state_dicts for rank 42 successfully loaded 8 ZeRO state_dicts for rank 190 successfully loaded 8 ZeRO state_dicts for rank 12 successfully loaded 8 ZeRO state_dicts for rank 44 successfully loaded 8 ZeRO state_dicts for rank 108 successfully loaded 8 ZeRO state_dicts for rank 219 successfully loaded 8 ZeRO state_dicts for rank 206 successfully loaded 8 ZeRO state_dicts for rank 128 successfully loaded 8 ZeRO state_dicts for rank 37 successfully loaded 8 ZeRO state_dicts for rank 33 successfully loaded 8 ZeRO state_dicts for rank 56 successfully loaded 8 ZeRO state_dicts for rank 62 successfully loaded 8 ZeRO state_dicts for rank 115 successfully loaded 8 ZeRO state_dicts for rank 24 successfully loaded 8 ZeRO state_dicts for rank 45 successfully loaded 8 ZeRO state_dicts for rank 192 successfully loaded 8 ZeRO state_dicts for rank 153 successfully loaded 8 ZeRO state_dicts for rank 134 successfully loaded 8 ZeRO state_dicts for rank 136 successfully loaded 8 ZeRO state_dicts for rank 38 successfully loaded 8 ZeRO state_dicts for rank 131 successfully loaded 8 ZeRO state_dicts for rank 121 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-24 05:53:20 CEST)" was missed by 0:00:03.058626 successfully loaded 8 ZeRO state_dicts for rank 217 successfully loaded 8 ZeRO state_dicts for rank 146 successfully loaded 8 ZeRO state_dicts for rank 195 successfully loaded 8 ZeRO state_dicts for rank 82 successfully loaded 8 ZeRO state_dicts for rank 191 successfully loaded 8 ZeRO state_dicts for rank 113 successfully loaded 8 ZeRO state_dicts for rank 158 successfully loaded 8 ZeRO state_dicts for rank 208 loading 8 zero partition checkpoints for rank 176 successfully loaded 8 ZeRO state_dicts for rank 65 successfully loaded 8 ZeRO state_dicts for rank 78 successfully loaded 8 ZeRO state_dicts for rank 93 successfully loaded 8 ZeRO state_dicts for rank 188 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-24 05:53:20 CEST)" was missed by 0:00:03.434951 successfully loaded 8 ZeRO state_dicts for rank 162 successfully loaded 8 ZeRO state_dicts for rank 63 successfully loaded 8 ZeRO state_dicts for rank 61 successfully loaded 8 ZeRO state_dicts for rank 221 successfully loaded 8 ZeRO state_dicts for rank 107 successfully loaded 8 ZeRO state_dicts for rank 179 successfully loaded 8 ZeRO state_dicts for rank 147 successfully loaded 8 ZeRO state_dicts for rank 36 loading 8 zero partition checkpoints for rank 132 successfully loaded 8 ZeRO state_dicts for rank 116 successfully loaded 8 ZeRO state_dicts for rank 199 loading 8 zero partition checkpoints for rank 88 loading 8 zero partition checkpoints for rank 170 successfully loaded 8 ZeRO state_dicts for rank 151 successfully loaded 8 ZeRO state_dicts for rank 76 successfully loaded 8 ZeRO state_dicts for rank 35 successfully loaded 8 ZeRO state_dicts for rank 223 successfully loaded 8 ZeRO state_dicts for rank 175 successfully loaded 8 ZeRO state_dicts for rank 13 successfully loaded 8 ZeRO state_dicts for rank 207 successfully loaded 8 ZeRO state_dicts for rank 218 successfully loaded 8 ZeRO state_dicts for rank 213 successfully loaded 8 ZeRO state_dicts for rank 119 successfully loaded 8 ZeRO state_dicts for rank 198 successfully loaded 8 ZeRO state_dicts for rank 164 loading 8 zero partition checkpoints for rank 159 successfully loaded 8 ZeRO state_dicts for rank 109 successfully loaded 8 ZeRO state_dicts for rank 197 successfully loaded 8 ZeRO state_dicts for rank 66 successfully loaded 8 ZeRO state_dicts for rank 22 successfully loaded 8 ZeRO state_dicts for rank 185 successfully loaded 8 ZeRO state_dicts for rank 196 successfully loaded 8 ZeRO state_dicts for rank 43 successfully loaded 8 ZeRO state_dicts for rank 204 successfully loaded 8 ZeRO state_dicts for rank 205 successfully loaded 8 ZeRO state_dicts for rank 181 successfully loaded 8 ZeRO state_dicts for rank 25 successfully loaded 8 ZeRO state_dicts for rank 91 successfully loaded 8 ZeRO state_dicts for rank 212 successfully loaded 8 ZeRO state_dicts for rank 173 successfully loaded 8 ZeRO state_dicts for rank 39 successfully loaded 8 ZeRO state_dicts for rank 161 successfully loaded 8 ZeRO state_dicts for rank 29 successfully loaded 8 ZeRO state_dicts for rank 26 successfully loaded 8 ZeRO state_dicts for rank 180 successfully loaded 8 ZeRO state_dicts for rank 28 successfully loaded 8 ZeRO state_dicts for rank 87 successfully loaded 8 ZeRO state_dicts for rank 53 successfully loaded 8 ZeRO state_dicts for rank 194 successfully loaded 8 ZeRO state_dicts for rank 54 successfully loaded 8 ZeRO state_dicts for rank 73 successfully loaded 8 ZeRO state_dicts for rank 21 successfully loaded 8 ZeRO state_dicts for rank 27 successfully loaded 8 ZeRO state_dicts for rank 46 successfully loaded 8 ZeRO state_dicts for rank 67 loading 8 zero partition checkpoints for rank 32 successfully loaded 8 ZeRO state_dicts for rank 184 successfully loaded 8 ZeRO state_dicts for rank 165 successfully loaded 8 ZeRO state_dicts for rank 118 successfully loaded 8 ZeRO state_dicts for rank 220 successfully loaded 8 ZeRO state_dicts for rank 57 successfully loaded 8 ZeRO state_dicts for rank 75 successfully loaded 8 ZeRO state_dicts for rank 0 successfully loaded 8 ZeRO state_dicts for rank 92 loading 8 zero partition checkpoints for rank 124 successfully loaded 8 ZeRO state_dicts for rank 94 successfully loaded 8 ZeRO state_dicts for rank 55 successfully loaded 8 ZeRO state_dicts for rank 72 successfully loaded 8 ZeRO state_dicts for rank 83 successfully loaded 8 ZeRO state_dicts for rank 6 successfully loaded 8 ZeRO state_dicts for rank 86 successfully loaded 8 ZeRO state_dicts for rank 189 successfully loaded 8 ZeRO state_dicts for rank 5 successfully loaded 8 ZeRO state_dicts for rank 117 successfully loaded 8 ZeRO state_dicts for rank 4 successfully loaded 8 ZeRO state_dicts for rank 30 successfully loaded 8 ZeRO state_dicts for rank 155 successfully loaded 8 ZeRO state_dicts for rank 1 successfully loaded 8 ZeRO state_dicts for rank 110 successfully loaded 8 ZeRO state_dicts for rank 58 successfully loaded 8 ZeRO state_dicts for rank 79 successfully loaded 8 ZeRO state_dicts for rank 101 successfully loaded 8 ZeRO state_dicts for rank 177 successfully loaded 8 ZeRO state_dicts for rank 2 loading 8 zero partition checkpoints for rank 167 successfully loaded 8 ZeRO state_dicts for rank 95 successfully loaded 8 ZeRO state_dicts for rank 227 loading 8 zero partition checkpoints for rank 171 successfully loaded 8 ZeRO state_dicts for rank 103 successfully loaded 8 ZeRO state_dicts for rank 142 loading 8 zero partition checkpoints for rank 96 successfully loaded 8 ZeRO state_dicts for rank 10 loading 8 zero partition checkpoints for rank 127 successfully loaded 8 ZeRO state_dicts for rank 31 successfully loaded 8 ZeRO state_dicts for rank 178 successfully loaded 8 ZeRO state_dicts for rank 3 successfully loaded 8 ZeRO state_dicts for rank 154 successfully loaded 8 ZeRO state_dicts for rank 47 successfully loaded 8 ZeRO state_dicts for rank 59 successfully loaded 8 ZeRO state_dicts for rank 23 successfully loaded 8 ZeRO state_dicts for rank 15 loading 8 zero partition checkpoints for rank 148 successfully loaded 8 ZeRO state_dicts for rank 182 successfully loaded 8 ZeRO state_dicts for rank 14 successfully loaded 8 ZeRO state_dicts for rank 252 successfully loaded 8 ZeRO state_dicts for rank 236 successfully loaded 8 ZeRO state_dicts for rank 224 successfully loaded 8 ZeRO state_dicts for rank 183 loading 8 zero partition checkpoints for rank 144 successfully loaded 8 ZeRO state_dicts for rank 138 loading 8 zero partition checkpoints for rank 99 successfully loaded 8 ZeRO state_dicts for rank 230 loading 8 zero partition checkpoints for rank 120 successfully loaded 8 ZeRO state_dicts for rank 238 loading 8 zero partition checkpoints for rank 156 successfully loaded 8 ZeRO state_dicts for rank 226 successfully loaded 8 ZeRO state_dicts for rank 8 successfully loaded 8 ZeRO state_dicts for rank 231 successfully loaded 8 ZeRO state_dicts for rank 243 successfully loaded 8 ZeRO state_dicts for rank 246 successfully loaded 8 ZeRO state_dicts for rank 150 successfully loaded 8 ZeRO state_dicts for rank 239 successfully loaded 8 ZeRO state_dicts for rank 250 loading 8 zero partition checkpoints for rank 104 successfully loaded 8 ZeRO state_dicts for rank 242 successfully loaded 8 ZeRO state_dicts for rank 234 loading 8 zero partition checkpoints for rank 140 successfully loaded 8 ZeRO state_dicts for rank 240 loading 8 zero partition checkpoints for rank 193 successfully loaded 8 ZeRO state_dicts for rank 254 loading 8 zero partition checkpoints for rank 169 successfully loaded 8 ZeRO state_dicts for rank 244 successfully loaded 8 ZeRO state_dicts for rank 9 loading 8 zero partition checkpoints for rank 112 successfully loaded 8 ZeRO state_dicts for rank 7 successfully loaded 8 ZeRO state_dicts for rank 241 loading 8 zero partition checkpoints for rank 69 successfully loaded 8 ZeRO state_dicts for rank 237 successfully loaded 8 ZeRO state_dicts for rank 174 loading 8 zero partition checkpoints for rank 201 successfully loaded 8 ZeRO state_dicts for rank 229 successfully loaded 8 ZeRO state_dicts for rank 248 successfully loaded 8 ZeRO state_dicts for rank 235 successfully loaded 8 ZeRO state_dicts for rank 253 loading 8 zero partition checkpoints for rank 209 loading 8 zero partition checkpoints for rank 40 loading 8 zero partition checkpoints for rank 60 successfully loaded 8 ZeRO state_dicts for rank 225 loading 8 zero partition checkpoints for rank 80 successfully loaded 8 ZeRO state_dicts for rank 232 successfully loaded 8 ZeRO state_dicts for rank 255 successfully loaded 8 ZeRO state_dicts for rank 247 loading 8 zero partition checkpoints for rank 90 loading 8 zero partition checkpoints for rank 143 successfully loaded 8 ZeRO state_dicts for rank 251 successfully loaded 8 ZeRO state_dicts for rank 233 loading 8 zero partition checkpoints for rank 125 loading 8 zero partition checkpoints for rank 34 loading 8 zero partition checkpoints for rank 106 successfully loaded 8 ZeRO state_dicts for rank 245 loading 8 zero partition checkpoints for rank 137 loading 8 zero partition checkpoints for rank 81 successfully loaded 8 ZeRO state_dicts for rank 102 loading 8 zero partition checkpoints for rank 187 loading 8 zero partition checkpoints for rank 215 successfully loaded 8 ZeRO state_dicts for rank 249 loading 8 zero partition checkpoints for rank 186 loading 8 zero partition checkpoints for rank 105 loading 8 zero partition checkpoints for rank 64 loading 8 zero partition checkpoints for rank 74 loading 8 zero partition checkpoints for rank 160 loading 8 zero partition checkpoints for rank 216 loading 8 zero partition checkpoints for rank 77 loading 8 zero partition checkpoints for rank 139 loading 8 zero partition checkpoints for rank 149 loading 8 zero partition checkpoints for rank 89 loading 8 zero partition checkpoints for rank 114 loading 8 zero partition checkpoints for rank 152 loading 8 zero partition checkpoints for rank 42 loading 8 zero partition checkpoints for rank 108 loading 8 zero partition checkpoints for rank 228 loading 8 zero partition checkpoints for rank 206 loading 8 zero partition checkpoints for rank 33 loading 8 zero partition checkpoints for rank 41 loading 8 zero partition checkpoints for rank 135 loading 8 zero partition checkpoints for rank 71 loading 8 zero partition checkpoints for rank 222 loading 8 zero partition checkpoints for rank 62 loading 8 zero partition checkpoints for rank 134 successfully loaded 8 ZeRO state_dicts for rank 11 loading 8 zero partition checkpoints for rank 129 loading 8 zero partition checkpoints for rank 126 loading 8 zero partition checkpoints for rank 192 loading 8 zero partition checkpoints for rank 153 loading 8 zero partition checkpoints for rank 202 loading 8 zero partition checkpoints for rank 128 loading 8 zero partition checkpoints for rank 84 loading 8 zero partition checkpoints for rank 141 loading 8 zero partition checkpoints for rank 45 loading 8 zero partition checkpoints for rank 115 loading 8 zero partition checkpoints for rank 56 loading 8 zero partition checkpoints for rank 111 loading 8 zero partition checkpoints for rank 121 loading 8 zero partition checkpoints for rank 130 loading 8 zero partition checkpoints for rank 20 loading 8 zero partition checkpoints for rank 133 loading 8 zero partition checkpoints for rank 38 loading 8 zero partition checkpoints for rank 122 loading 8 zero partition checkpoints for rank 97 loading 8 zero partition checkpoints for rank 158 loading 8 zero partition checkpoints for rank 85 loading 8 zero partition checkpoints for rank 157 loading 8 zero partition checkpoints for rank 78 loading 8 zero partition checkpoints for rank 162 loading 8 zero partition checkpoints for rank 191 loading 8 zero partition checkpoints for rank 65 loading 8 zero partition checkpoints for rank 44 loading 8 zero partition checkpoints for rank 82 loading 8 zero partition checkpoints for rank 98 loading 8 zero partition checkpoints for rank 63 loading 8 zero partition checkpoints for rank 12 loading 8 zero partition checkpoints for rank 113 loading 8 zero partition checkpoints for rank 188 loading 8 zero partition checkpoints for rank 151 loading 8 zero partition checkpoints for rank 146 loading 8 zero partition checkpoints for rank 36 loading 8 zero partition checkpoints for rank 123 loading 8 zero partition checkpoints for rank 210 loading 8 zero partition checkpoints for rank 37 loading 8 zero partition checkpoints for rank 119 loading 8 zero partition checkpoints for rank 197 loading 8 zero partition checkpoints for rank 223 loading 8 zero partition checkpoints for rank 52 loading 8 zero partition checkpoints for rank 179 loading 8 zero partition checkpoints for rank 76 loading 8 zero partition checkpoints for rank 218 loading 8 zero partition checkpoints for rank 219 loading 8 zero partition checkpoints for rank 35 loading 8 zero partition checkpoints for rank 107 loading 8 zero partition checkpoints for rank 163 loading 8 zero partition checkpoints for rank 43 loading 8 zero partition checkpoints for rank 212 loading 8 zero partition checkpoints for rank 49 loading 8 zero partition checkpoints for rank 208 loading 8 zero partition checkpoints for rank 181 loading 8 zero partition checkpoints for rank 91 loading 8 zero partition checkpoints for rank 185 loading 8 zero partition checkpoints for rank 214 loading 8 zero partition checkpoints for rank 53 loading 8 zero partition checkpoints for rank 75 loading 8 zero partition checkpoints for rank 46 loading 8 zero partition checkpoints for rank 165 loading 8 zero partition checkpoints for rank 57 loading 8 zero partition checkpoints for rank 211 loading 8 zero partition checkpoints for rank 180 loading 8 zero partition checkpoints for rank 55 loading 8 zero partition checkpoints for rank 217 loading 8 zero partition checkpoints for rank 92 loading 8 zero partition checkpoints for rank 61 loading 8 zero partition checkpoints for rank 110 loading 8 zero partition checkpoints for rank 196 loading 8 zero partition checkpoints for rank 205 loading 8 zero partition checkpoints for rank 83 loading 8 zero partition checkpoints for rank 25 loading 8 zero partition checkpoints for rank 68 loading 8 zero partition checkpoints for rank 195 loading 8 zero partition checkpoints for rank 118 loading 8 zero partition checkpoints for rank 79 loading 8 zero partition checkpoints for rank 155 loading 8 zero partition checkpoints for rank 184 loading 8 zero partition checkpoints for rank 94 loading 8 zero partition checkpoints for rank 39 loading 8 zero partition checkpoints for rank 27 loading 8 zero partition checkpoints for rank 21 loading 8 zero partition checkpoints for rank 58 loading 8 zero partition checkpoints for rank 103 loading 8 zero partition checkpoints for rank 100 loading 8 zero partition checkpoints for rank 101 loading 8 zero partition checkpoints for rank 154 loading 8 zero partition checkpoints for rank 131 loading 8 zero partition checkpoints for rank 145 loading 8 zero partition checkpoints for rank 0 loading 8 zero partition checkpoints for rank 136 checkpoint version 3.0 loading 8 zero partition checkpoints for rank 48 loading 8 zero partition checkpoints for rank 51 loading 8 zero partition checkpoints for rank 29 loading 8 zero partition checkpoints for rank 109 loading 8 zero partition checkpoints for rank 213 loading 8 zero partition checkpoints for rank 93 loading 8 zero partition checkpoints for rank 183 loading 8 zero partition checkpoints for rank 72 loading 8 zero partition checkpoints for rank 59 loading 8 zero partition checkpoints for rank 200 loading 8 zero partition checkpoints for rank 73 loading 8 zero partition checkpoints for rank 142 loading 8 zero partition checkpoints for rank 182 loading 8 zero partition checkpoints for rank 70 loading 8 zero partition checkpoints for rank 161 loading 8 zero partition checkpoints for rank 150 loading 8 zero partition checkpoints for rank 5 loading 8 zero partition checkpoints for rank 203 loading 8 zero partition checkpoints for rank 194 loading 8 zero partition checkpoints for rank 190 loading 8 zero partition checkpoints for rank 6 loading 8 zero partition checkpoints for rank 54 loading 8 zero partition checkpoints for rank 47 loading 8 zero partition checkpoints for rank 221 loading 8 zero partition checkpoints for rank 4 loading 8 zero partition checkpoints for rank 138 loading 8 zero partition checkpoints for rank 50 loading 8 zero partition checkpoints for rank 3 loading 8 zero partition checkpoints for rank 177 loading 8 zero partition checkpoints for rank 30 loading 8 zero partition checkpoints for rank 15 loading 8 zero partition checkpoints for rank 166 loading 8 zero partition checkpoints for rank 226 loading 8 zero partition checkpoints for rank 238 loading 8 zero partition checkpoints for rank 207 loading 8 zero partition checkpoints for rank 22 loading 8 zero partition checkpoints for rank 147 loading 8 zero partition checkpoints for rank 87 loading 8 zero partition checkpoints for rank 178 loading 8 zero partition checkpoints for rank 172 loading 8 zero partition checkpoints for rank 204 loading 8 zero partition checkpoints for rank 66 loading 8 zero partition checkpoints for rank 250 loading 8 zero partition checkpoints for rank 220 loading 8 zero partition checkpoints for rank 254 loading 8 zero partition checkpoints for rank 95 loading 8 zero partition checkpoints for rank 239 loading 8 zero partition checkpoints for rank 24 loading 8 zero partition checkpoints for rank 86 loading 8 zero partition checkpoints for rank 189 loading 8 zero partition checkpoints for rank 229 loading 8 zero partition checkpoints for rank 241 loading 8 zero partition checkpoints for rank 240 loading 8 zero partition checkpoints for rank 253 loading 8 zero partition checkpoints for rank 199 loading 8 zero partition checkpoints for rank 67 loading 8 zero partition checkpoints for rank 175 loading 8 zero partition checkpoints for rank 225 loading 8 zero partition checkpoints for rank 164 loading 8 zero partition checkpoints for rank 246 loading 8 zero partition checkpoints for rank 236 loading 8 zero partition checkpoints for rank 198 loading 8 zero partition checkpoints for rank 247 loading 8 zero partition checkpoints for rank 233 loading 8 zero partition checkpoints for rank 116 loading 8 zero partition checkpoints for rank 7 loading 8 zero partition checkpoints for rank 248 loading 8 zero partition checkpoints for rank 232 loading 8 zero partition checkpoints for rank 230 loading 8 zero partition checkpoints for rank 173 loading 8 zero partition checkpoints for rank 231 loading 8 zero partition checkpoints for rank 244 loading 8 zero partition checkpoints for rank 117 loading 8 zero partition checkpoints for rank 102 loading 8 zero partition checkpoints for rank 26 loading 8 zero partition checkpoints for rank 23 loading 8 zero partition checkpoints for rank 245 loading 8 zero partition checkpoints for rank 237 loading 8 zero partition checkpoints for rank 227 loading 8 zero partition checkpoints for rank 28 loading 8 zero partition checkpoints for rank 252 loading 8 zero partition checkpoints for rank 13 loading 8 zero partition checkpoints for rank 1 loading 8 zero partition checkpoints for rank 174 loading 8 zero partition checkpoints for rank 242 loading 8 zero partition checkpoints for rank 224 loading 8 zero partition checkpoints for rank 2 loading 8 zero partition checkpoints for rank 31 loading 8 zero partition checkpoints for rank 243 loading 8 zero partition checkpoints for rank 14 loading 8 zero partition checkpoints for rank 234 loading 8 zero partition checkpoints for rank 255 loading 8 zero partition checkpoints for rank 235 loading 8 zero partition checkpoints for rank 251 loading 8 zero partition checkpoints for rank 10 loading 8 zero partition checkpoints for rank 249 loading 8 zero partition checkpoints for rank 9 loading 8 zero partition checkpoints for rank 8 loading 8 zero partition checkpoints for rank 11 successfully loaded 8 ZeRO state_dicts for rank 17 successfully loaded 8 ZeRO state_dicts for rank 19 successfully loaded 8 ZeRO state_dicts for rank 18 successfully loaded 8 ZeRO state_dicts for rank 16 loading 8 zero partition checkpoints for rank 17 loading 8 zero partition checkpoints for rank 19 loading 8 zero partition checkpoints for rank 18 loading 8 zero partition checkpoints for rank 16 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints at iteration 942 time (ms) | load-checkpoint: 82978.97 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-24 05:54:15 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 300000000 validation: 1638400 test: 10240 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.135933 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.348 seconds total number of samples: 394611670 total number of epochs: 3 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.321 seconds total number of samples: 6927161 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.062 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-09-24 05:54:21 done with setup ... training ... time (ms) | model-and-optimizer-setup: 91017.54 | train/valid/test-data-iterators-setup: 4740.91 [before the start of training step] datetime: 2021-09-24 05:54:21 [2021-09-24 05:54:21,235] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information [2021-09-24 05:54:21,235] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-09-24 05:54:21,235] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 32 total layers [2021-09-24 05:54:21,235] [INFO] [checkpointing.py:415:forward] ----Synchronization False [2021-09-24 05:54:21,235] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False [Rank 1] (after 943 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 22890.0 | max reserved: 22890.0 [Rank 225] (after 943 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 22108.0 | max reserved: 22108.0 [Rank 65] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 33] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 97] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 129] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 193] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0 [Rank 161] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 2] (after 943 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 21150.0 | max reserved: 21150.0 [Rank 34] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 226] (after 943 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 21700.0 | max reserved: 21700.0 [Rank 66] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18586.0 | max reserved: 18586.0 [Rank 98] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 162] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 130] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18458.0 | max reserved: 18458.0 [Rank 194] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18826.0 | max reserved: 18826.0 [Rank 0] (after 943 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 23526.0 | max reserved: 23526.0 [Rank 32] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 19012.0 | max reserved: 19012.0 [Rank 64] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 19012.0 | max reserved: 19012.0 [Rank 224] (after 943 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 22492.0 | max reserved: 22492.0 [Rank 96] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18948.0 | max reserved: 18948.0 [Rank 128] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 19012.0 | max reserved: 19012.0 [Rank 192] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 19076.0 | max reserved: 19076.0 [Rank 160] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 19012.0 | max reserved: 19012.0 [Rank 3] (after 943 iterations) memory (MB) | allocated: 6661.611328125 | max allocated: 11742.55810546875 | reserved: 21150.0 | max reserved: 21150.0 [Rank 35] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18826.0 | max reserved: 18826.0 [Rank 227] (after 943 iterations) memory (MB) | allocated: 7107.70751953125 | max allocated: 11884.6845703125 | reserved: 22492.0 | max reserved: 22492.0 [Rank 67] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18458.0 | max reserved: 18458.0 [Rank 99] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18522.0 | max reserved: 18522.0 [Rank 163] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 131] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18442.0 | max reserved: 18442.0 [Rank 195] (after 943 iterations) memory (MB) | allocated: 5861.5498046875 | max allocated: 10450.46337890625 | reserved: 18826.0 | max reserved: 18826.0 iteration 943/ 159576 | consumed samples: 15088 | elapsed time per iteration (ms): 29806.1 | learning rate: 4.185E-06 | global batch size: 16 | lm loss: 7.642442E+00 | loss scale: 8192.0 | grad norm: 53639.718 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 944/ 159576 | consumed samples: 15104 | elapsed time per iteration (ms): 13012.2 | learning rate: 4.189E-06 | global batch size: 16 | lm loss: 7.638637E+00 | loss scale: 8192.0 | grad norm: 47002.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 945/ 159576 | consumed samples: 15120 | elapsed time per iteration (ms): 13551.8 | learning rate: 4.194E-06 | global batch size: 16 | lm loss: 7.559312E+00 | loss scale: 8192.0 | grad norm: 43680.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 946/ 159576 | consumed samples: 15136 | elapsed time per iteration (ms): 13672.0 | learning rate: 4.198E-06 | global batch size: 16 | lm loss: 7.372701E+00 | loss scale: 8192.0 | grad norm: 29642.562 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 947/ 159576 | consumed samples: 15152 | elapsed time per iteration (ms): 13523.5 | learning rate: 4.203E-06 | global batch size: 16 | lm loss: 7.431667E+00 | loss scale: 8192.0 | grad norm: 71525.963 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 948/ 159576 | consumed samples: 15168 | elapsed time per iteration (ms): 13571.1 | learning rate: 4.207E-06 | global batch size: 16 | lm loss: 7.622519E+00 | loss scale: 8192.0 | grad norm: 108314.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 949/ 159576 | consumed samples: 15184 | elapsed time per iteration (ms): 13513.7 | learning rate: 4.212E-06 | global batch size: 16 | lm loss: 7.491040E+00 | loss scale: 8192.0 | grad norm: 83775.616 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 950/ 159576 | consumed samples: 15200 | elapsed time per iteration (ms): 13857.2 | learning rate: 4.216E-06 | global batch size: 16 | lm loss: 7.689845E+00 | loss scale: 8192.0 | grad norm: 42694.796 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 951/ 159576 | consumed samples: 15216 | elapsed time per iteration (ms): 13556.0 | learning rate: 4.220E-06 | global batch size: 16 | lm loss: 7.541234E+00 | loss scale: 8192.0 | grad norm: 36744.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 952/ 159576 | consumed samples: 15232 | elapsed time per iteration (ms): 13565.0 | learning rate: 4.225E-06 | global batch size: 16 | lm loss: 7.402619E+00 | loss scale: 8192.0 | grad norm: 37335.008 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 953/ 159576 | consumed samples: 15248 | elapsed time per iteration (ms): 13600.8 | learning rate: 4.229E-06 | global batch size: 16 | lm loss: 7.524664E+00 | loss scale: 8192.0 | grad norm: 36490.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 954/ 159576 | consumed samples: 15264 | elapsed time per iteration (ms): 13538.1 | learning rate: 4.234E-06 | global batch size: 16 | lm loss: 6.926525E+00 | loss scale: 8192.0 | grad norm: 28573.010 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 955/ 159576 | consumed samples: 15280 | elapsed time per iteration (ms): 13767.3 | learning rate: 4.238E-06 | global batch size: 16 | lm loss: 7.564863E+00 | loss scale: 8192.0 | grad norm: 45556.471 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 956/ 159576 | consumed samples: 15296 | elapsed time per iteration (ms): 13529.6 | learning rate: 4.243E-06 | global batch size: 16 | lm loss: 7.518897E+00 | loss scale: 8192.0 | grad norm: 40483.089 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 957/ 159576 | consumed samples: 15312 | elapsed time per iteration (ms): 13548.2 | learning rate: 4.247E-06 | global batch size: 16 | lm loss: 7.292015E+00 | loss scale: 8192.0 | grad norm: 27123.950 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 958/ 159576 | consumed samples: 15328 | elapsed time per iteration (ms): 13592.2 | learning rate: 4.251E-06 | global batch size: 16 | lm loss: 7.645267E+00 | loss scale: 8192.0 | grad norm: 45895.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 959/ 159576 | consumed samples: 15344 | elapsed time per iteration (ms): 13834.7 | learning rate: 4.256E-06 | global batch size: 16 | lm loss: 7.439256E+00 | loss scale: 8192.0 | grad norm: 47827.958 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 960/ 159576 | consumed samples: 15360 | elapsed time per iteration (ms): 13548.7 | learning rate: 4.260E-06 | global batch size: 16 | lm loss: 7.398325E+00 | loss scale: 8192.0 | grad norm: 41514.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 961/ 159576 | consumed samples: 15376 | elapsed time per iteration (ms): 13540.1 | learning rate: 4.265E-06 | global batch size: 16 | lm loss: 7.498395E+00 | loss scale: 8192.0 | grad norm: 24323.912 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 962/ 159576 | consumed samples: 15392 | elapsed time per iteration (ms): 13596.3 | learning rate: 4.269E-06 | global batch size: 16 | lm loss: 7.458749E+00 | loss scale: 8192.0 | grad norm: 37806.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 963/ 159576 | consumed samples: 15408 | elapsed time per iteration (ms): 13925.1 | learning rate: 4.274E-06 | global batch size: 16 | lm loss: 7.414832E+00 | loss scale: 8192.0 | grad norm: 38291.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 964/ 159576 | consumed samples: 15424 | elapsed time per iteration (ms): 13505.9 | learning rate: 4.278E-06 | global batch size: 16 | lm loss: 7.552760E+00 | loss scale: 8192.0 | grad norm: 23290.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 965/ 159576 | consumed samples: 15440 | elapsed time per iteration (ms): 13598.7 | learning rate: 4.283E-06 | global batch size: 16 | lm loss: 7.566991E+00 | loss scale: 8192.0 | grad norm: 33429.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 966/ 159576 | consumed samples: 15456 | elapsed time per iteration (ms): 13495.5 | learning rate: 4.287E-06 | global batch size: 16 | lm loss: 7.727429E+00 | loss scale: 8192.0 | grad norm: 33196.940 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 967/ 159576 | consumed samples: 15472 | elapsed time per iteration (ms): 13508.3 | learning rate: 4.291E-06 | global batch size: 16 | lm loss: 7.517751E+00 | loss scale: 8192.0 | grad norm: 25674.592 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 968/ 159576 | consumed samples: 15488 | elapsed time per iteration (ms): 13747.8 | learning rate: 4.296E-06 | global batch size: 16 | lm loss: 7.534285E+00 | loss scale: 8192.0 | grad norm: 28899.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 969/ 159576 | consumed samples: 15504 | elapsed time per iteration (ms): 13541.9 | learning rate: 4.300E-06 | global batch size: 16 | lm loss: 7.412315E+00 | loss scale: 8192.0 | grad norm: 23856.723 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 970/ 159576 | consumed samples: 15520 | elapsed time per iteration (ms): 13581.6 | learning rate: 4.305E-06 | global batch size: 16 | lm loss: 7.574214E+00 | loss scale: 8192.0 | grad norm: 26912.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 971/ 159576 | consumed samples: 15536 | elapsed time per iteration (ms): 13575.2 | learning rate: 4.309E-06 | global batch size: 16 | lm loss: 7.489717E+00 | loss scale: 8192.0 | grad norm: 25683.773 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 972/ 159576 | consumed samples: 15552 | elapsed time per iteration (ms): 14047.8 | learning rate: 4.314E-06 | global batch size: 16 | lm loss: 7.479139E+00 | loss scale: 8192.0 | grad norm: 23963.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 973/ 159576 | consumed samples: 15568 | elapsed time per iteration (ms): 13519.1 | learning rate: 4.318E-06 | global batch size: 16 | lm loss: 7.557629E+00 | loss scale: 8192.0 | grad norm: 28281.687 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 974/ 159576 | consumed samples: 15584 | elapsed time per iteration (ms): 13508.3 | learning rate: 4.322E-06 | global batch size: 16 | lm loss: 7.324095E+00 | loss scale: 8192.0 | grad norm: 24628.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 975/ 159576 | consumed samples: 15600 | elapsed time per iteration (ms): 13557.4 | learning rate: 4.327E-06 | global batch size: 16 | lm loss: 7.551218E+00 | loss scale: 8192.0 | grad norm: 22604.906 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 976/ 159576 | consumed samples: 15616 | elapsed time per iteration (ms): 13573.2 | learning rate: 4.331E-06 | global batch size: 16 | lm loss: 7.421384E+00 | loss scale: 8192.0 | grad norm: 25754.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 977/ 159576 | consumed samples: 15632 | elapsed time per iteration (ms): 13891.1 | learning rate: 4.336E-06 | global batch size: 16 | lm loss: 7.421275E+00 | loss scale: 8192.0 | grad norm: 23427.022 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 978/ 159576 | consumed samples: 15648 | elapsed time per iteration (ms): 13578.3 | learning rate: 4.340E-06 | global batch size: 16 | lm loss: 7.468715E+00 | loss scale: 8192.0 | grad norm: 25697.467 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 979/ 159576 | consumed samples: 15664 | elapsed time per iteration (ms): 13602.5 | learning rate: 4.345E-06 | global batch size: 16 | lm loss: 7.679566E+00 | loss scale: 8192.0 | grad norm: 25403.982 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 980/ 159576 | consumed samples: 15680 | elapsed time per iteration (ms): 13628.8 | learning rate: 4.349E-06 | global batch size: 16 | lm loss: 7.442289E+00 | loss scale: 8192.0 | grad norm: 30230.032 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 981/ 159576 | consumed samples: 15696 | elapsed time per iteration (ms): 13812.5 | learning rate: 4.354E-06 | global batch size: 16 | lm loss: 7.521616E+00 | loss scale: 8192.0 | grad norm: 29030.478 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 982/ 159576 | consumed samples: 15712 | elapsed time per iteration (ms): 13617.0 | learning rate: 4.358E-06 | global batch size: 16 | lm loss: 7.595479E+00 | loss scale: 8192.0 | grad norm: 32518.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 06:03:44] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1162855_[2-10%1] on 'gpu_p13' partition) [2021-09-24 06:03:44] PULSE: tr8-104B is running for 11:33 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 983/ 159576 | consumed samples: 15728 | elapsed time per iteration (ms): 13560.9 | learning rate: 4.362E-06 | global batch size: 16 | lm loss: 7.437976E+00 | loss scale: 8192.0 | grad norm: 25658.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 984/ 159576 | consumed samples: 15744 | elapsed time per iteration (ms): 13555.5 | learning rate: 4.367E-06 | global batch size: 16 | lm loss: 7.561976E+00 | loss scale: 8192.0 | grad norm: 28146.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 985/ 159576 | consumed samples: 15760 | elapsed time per iteration (ms): 13993.9 | learning rate: 4.371E-06 | global batch size: 16 | lm loss: 7.526425E+00 | loss scale: 8192.0 | grad norm: 22789.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 986/ 159576 | consumed samples: 15776 | elapsed time per iteration (ms): 13819.4 | learning rate: 4.376E-06 | global batch size: 16 | lm loss: 7.568769E+00 | loss scale: 8192.0 | grad norm: 29742.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 987/ 159576 | consumed samples: 15792 | elapsed time per iteration (ms): 13655.7 | learning rate: 4.380E-06 | global batch size: 16 | lm loss: 7.516987E+00 | loss scale: 8192.0 | grad norm: 29352.083 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 988/ 159576 | consumed samples: 15808 | elapsed time per iteration (ms): 13528.1 | learning rate: 4.385E-06 | global batch size: 16 | lm loss: 7.482485E+00 | loss scale: 8192.0 | grad norm: 23020.708 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 989/ 159576 | consumed samples: 15824 | elapsed time per iteration (ms): 13534.2 | learning rate: 4.389E-06 | global batch size: 16 | lm loss: 7.601320E+00 | loss scale: 8192.0 | grad norm: 23202.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 990/ 159576 | consumed samples: 15840 | elapsed time per iteration (ms): 13617.6 | learning rate: 4.393E-06 | global batch size: 16 | lm loss: 7.522967E+00 | loss scale: 8192.0 | grad norm: 26298.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 991/ 159576 | consumed samples: 15856 | elapsed time per iteration (ms): 13569.7 | learning rate: 4.398E-06 | global batch size: 16 | lm loss: 7.564295E+00 | loss scale: 8192.0 | grad norm: 30127.017 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 992/ 159576 | consumed samples: 15872 | elapsed time per iteration (ms): 13596.4 | learning rate: 4.402E-06 | global batch size: 16 | lm loss: 7.530395E+00 | loss scale: 8192.0 | grad norm: 25061.967 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 993/ 159576 | consumed samples: 15888 | elapsed time per iteration (ms): 13641.4 | learning rate: 4.407E-06 | global batch size: 16 | lm loss: 7.547958E+00 | loss scale: 8192.0 | grad norm: 24314.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 994/ 159576 | consumed samples: 15904 | elapsed time per iteration (ms): 13912.4 | learning rate: 4.411E-06 | global batch size: 16 | lm loss: 7.429228E+00 | loss scale: 8192.0 | grad norm: 28339.027 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 995/ 159576 | consumed samples: 15920 | elapsed time per iteration (ms): 13541.6 | learning rate: 4.416E-06 | global batch size: 16 | lm loss: 7.511089E+00 | loss scale: 8192.0 | grad norm: 27156.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 996/ 159576 | consumed samples: 15936 | elapsed time per iteration (ms): 13577.4 | learning rate: 4.420E-06 | global batch size: 16 | lm loss: 7.332575E+00 | loss scale: 8192.0 | grad norm: 26750.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 997/ 159576 | consumed samples: 15952 | elapsed time per iteration (ms): 13524.5 | learning rate: 4.425E-06 | global batch size: 16 | lm loss: 7.478838E+00 | loss scale: 8192.0 | grad norm: 30934.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 998/ 159576 | consumed samples: 15968 | elapsed time per iteration (ms): 13570.2 | learning rate: 4.429E-06 | global batch size: 16 | lm loss: 7.363966E+00 | loss scale: 8192.0 | grad norm: 26717.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 999/ 159576 | consumed samples: 15984 | elapsed time per iteration (ms): 13808.8 | learning rate: 4.433E-06 | global batch size: 16 | lm loss: 7.504936E+00 | loss scale: 8192.0 | grad norm: 33504.939 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1000/ 159576 | consumed samples: 16000 | elapsed time per iteration (ms): 13740.5 | learning rate: 4.438E-06 | global batch size: 16 | lm loss: 7.441235E+00 | loss scale: 16384.0 | grad norm: 39922.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------ validation loss at iteration 1000 | lm loss value: 7.422922E+00 | lm loss PPL: 1.673917E+03 | ------------------------------------------------------------------------------------------------ iteration 1001/ 159576 | consumed samples: 16016 | elapsed time per iteration (ms): 18607.4 | learning rate: 4.442E-06 | global batch size: 16 | lm loss: 7.375732E+00 | loss scale: 16384.0 | grad norm: 55247.055 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1002/ 159576 | consumed samples: 16032 | elapsed time per iteration (ms): 13593.5 | learning rate: 4.447E-06 | global batch size: 16 | lm loss: 7.377642E+00 | loss scale: 16384.0 | grad norm: 69178.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1003/ 159576 | consumed samples: 16048 | elapsed time per iteration (ms): 13772.4 | learning rate: 4.451E-06 | global batch size: 16 | lm loss: 7.399412E+00 | loss scale: 16384.0 | grad norm: 56841.570 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1004/ 159576 | consumed samples: 16064 | elapsed time per iteration (ms): 13547.9 | learning rate: 4.456E-06 | global batch size: 16 | lm loss: 7.476449E+00 | loss scale: 16384.0 | grad norm: 53109.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1005/ 159576 | consumed samples: 16080 | elapsed time per iteration (ms): 13546.4 | learning rate: 4.460E-06 | global batch size: 16 | lm loss: 7.394112E+00 | loss scale: 16384.0 | grad norm: 62368.875 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1006/ 159576 | consumed samples: 16096 | elapsed time per iteration (ms): 13685.8 | learning rate: 4.464E-06 | global batch size: 16 | lm loss: 7.426886E+00 | loss scale: 16384.0 | grad norm: 57003.932 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1007/ 159576 | consumed samples: 16112 | elapsed time per iteration (ms): 14078.3 | learning rate: 4.469E-06 | global batch size: 16 | lm loss: 7.601004E+00 | loss scale: 16384.0 | grad norm: 62664.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1008/ 159576 | consumed samples: 16128 | elapsed time per iteration (ms): 13787.6 | learning rate: 4.473E-06 | global batch size: 16 | lm loss: 7.774883E+00 | loss scale: 16384.0 | grad norm: 97296.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1009/ 159576 | consumed samples: 16144 | elapsed time per iteration (ms): 13687.7 | learning rate: 4.478E-06 | global batch size: 16 | lm loss: 7.604346E+00 | loss scale: 16384.0 | grad norm: 65941.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1010/ 159576 | consumed samples: 16160 | elapsed time per iteration (ms): 13703.4 | learning rate: 4.482E-06 | global batch size: 16 | lm loss: 7.360181E+00 | loss scale: 16384.0 | grad norm: 64245.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1011/ 159576 | consumed samples: 16176 | elapsed time per iteration (ms): 14077.4 | learning rate: 4.487E-06 | global batch size: 16 | lm loss: 7.590093E+00 | loss scale: 16384.0 | grad norm: 66963.039 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1012/ 159576 | consumed samples: 16192 | elapsed time per iteration (ms): 13697.2 | learning rate: 4.491E-06 | global batch size: 16 | lm loss: 7.648331E+00 | loss scale: 16384.0 | grad norm: 62407.028 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1013/ 159576 | consumed samples: 16208 | elapsed time per iteration (ms): 13676.8 | learning rate: 4.496E-06 | global batch size: 16 | lm loss: 7.462048E+00 | loss scale: 16384.0 | grad norm: 76557.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1014/ 159576 | consumed samples: 16224 | elapsed time per iteration (ms): 13713.9 | learning rate: 4.500E-06 | global batch size: 16 | lm loss: 7.345057E+00 | loss scale: 16384.0 | grad norm: 58991.980 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1015/ 159576 | consumed samples: 16240 | elapsed time per iteration (ms): 13740.6 | learning rate: 4.504E-06 | global batch size: 16 | lm loss: 7.369339E+00 | loss scale: 16384.0 | grad norm: 76798.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1016/ 159576 | consumed samples: 16256 | elapsed time per iteration (ms): 13921.9 | learning rate: 4.509E-06 | global batch size: 16 | lm loss: 7.564117E+00 | loss scale: 16384.0 | grad norm: 64166.866 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1017/ 159576 | consumed samples: 16272 | elapsed time per iteration (ms): 13632.9 | learning rate: 4.513E-06 | global batch size: 16 | lm loss: 7.610378E+00 | loss scale: 16384.0 | grad norm: 65353.003 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1018/ 159576 | consumed samples: 16288 | elapsed time per iteration (ms): 13686.4 | learning rate: 4.518E-06 | global batch size: 16 | lm loss: 7.676594E+00 | loss scale: 16384.0 | grad norm: 64547.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1019/ 159576 | consumed samples: 16304 | elapsed time per iteration (ms): 13717.6 | learning rate: 4.522E-06 | global batch size: 16 | lm loss: 7.406422E+00 | loss scale: 16384.0 | grad norm: 63594.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1020/ 159576 | consumed samples: 16320 | elapsed time per iteration (ms): 13939.6 | learning rate: 4.527E-06 | global batch size: 16 | lm loss: 7.459125E+00 | loss scale: 16384.0 | grad norm: 59823.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1021/ 159576 | consumed samples: 16336 | elapsed time per iteration (ms): 13792.3 | learning rate: 4.531E-06 | global batch size: 16 | lm loss: 7.471806E+00 | loss scale: 16384.0 | grad norm: 56872.925 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1022/ 159576 | consumed samples: 16352 | elapsed time per iteration (ms): 13687.8 | learning rate: 4.536E-06 | global batch size: 16 | lm loss: 7.110139E+00 | loss scale: 16384.0 | grad norm: 58937.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1023/ 159576 | consumed samples: 16368 | elapsed time per iteration (ms): 13711.6 | learning rate: 4.540E-06 | global batch size: 16 | lm loss: 7.428498E+00 | loss scale: 16384.0 | grad norm: 57885.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1024/ 159576 | consumed samples: 16384 | elapsed time per iteration (ms): 14207.9 | learning rate: 4.544E-06 | global batch size: 16 | lm loss: 7.374810E+00 | loss scale: 16384.0 | grad norm: 56855.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1025/ 159576 | consumed samples: 16400 | elapsed time per iteration (ms): 13557.2 | learning rate: 4.549E-06 | global batch size: 16 | lm loss: 7.597025E+00 | loss scale: 16384.0 | grad norm: 57119.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1026/ 159576 | consumed samples: 16416 | elapsed time per iteration (ms): 13700.8 | learning rate: 4.553E-06 | global batch size: 16 | lm loss: 7.473170E+00 | loss scale: 16384.0 | grad norm: 61762.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1027/ 159576 | consumed samples: 16432 | elapsed time per iteration (ms): 13696.5 | learning rate: 4.558E-06 | global batch size: 16 | lm loss: 7.410631E+00 | loss scale: 16384.0 | grad norm: 63393.977 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1028/ 159576 | consumed samples: 16448 | elapsed time per iteration (ms): 13664.5 | learning rate: 4.562E-06 | global batch size: 16 | lm loss: 7.475993E+00 | loss scale: 16384.0 | grad norm: 61819.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1029/ 159576 | consumed samples: 16464 | elapsed time per iteration (ms): 13836.3 | learning rate: 4.567E-06 | global batch size: 16 | lm loss: 7.464800E+00 | loss scale: 16384.0 | grad norm: 52336.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1030/ 159576 | consumed samples: 16480 | elapsed time per iteration (ms): 13692.5 | learning rate: 4.571E-06 | global batch size: 16 | lm loss: 7.449406E+00 | loss scale: 16384.0 | grad norm: 66491.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1031/ 159576 | consumed samples: 16496 | elapsed time per iteration (ms): 13635.2 | learning rate: 4.575E-06 | global batch size: 16 | lm loss: 7.519850E+00 | loss scale: 16384.0 | grad norm: 65780.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1032/ 159576 | consumed samples: 16512 | elapsed time per iteration (ms): 13708.9 | learning rate: 4.580E-06 | global batch size: 16 | lm loss: 7.513804E+00 | loss scale: 16384.0 | grad norm: 62434.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1033/ 159576 | consumed samples: 16528 | elapsed time per iteration (ms): 13952.8 | learning rate: 4.584E-06 | global batch size: 16 | lm loss: 7.405169E+00 | loss scale: 16384.0 | grad norm: 74264.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1034/ 159576 | consumed samples: 16544 | elapsed time per iteration (ms): 13788.4 | learning rate: 4.589E-06 | global batch size: 16 | lm loss: 7.367761E+00 | loss scale: 16384.0 | grad norm: 75791.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1035/ 159576 | consumed samples: 16560 | elapsed time per iteration (ms): 13716.5 | learning rate: 4.593E-06 | global batch size: 16 | lm loss: 7.513783E+00 | loss scale: 16384.0 | grad norm: 91765.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1036/ 159576 | consumed samples: 16576 | elapsed time per iteration (ms): 13658.1 | learning rate: 4.598E-06 | global batch size: 16 | lm loss: 7.556536E+00 | loss scale: 16384.0 | grad norm: 76354.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1037/ 159576 | consumed samples: 16592 | elapsed time per iteration (ms): 13995.5 | learning rate: 4.602E-06 | global batch size: 16 | lm loss: 7.423755E+00 | loss scale: 16384.0 | grad norm: 70528.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1038/ 159576 | consumed samples: 16608 | elapsed time per iteration (ms): 13797.2 | learning rate: 4.607E-06 | global batch size: 16 | lm loss: 7.452043E+00 | loss scale: 16384.0 | grad norm: 63200.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1039/ 159576 | consumed samples: 16624 | elapsed time per iteration (ms): 13728.6 | learning rate: 4.611E-06 | global batch size: 16 | lm loss: 7.310857E+00 | loss scale: 16384.0 | grad norm: 135045.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1040/ 159576 | consumed samples: 16640 | elapsed time per iteration (ms): 13690.2 | learning rate: 4.615E-06 | global batch size: 16 | lm loss: 7.374257E+00 | loss scale: 16384.0 | grad norm: 69159.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1041/ 159576 | consumed samples: 16656 | elapsed time per iteration (ms): 13682.9 | learning rate: 4.620E-06 | global batch size: 16 | lm loss: 7.498551E+00 | loss scale: 16384.0 | grad norm: 67982.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1042/ 159576 | consumed samples: 16672 | elapsed time per iteration (ms): 13991.8 | learning rate: 4.624E-06 | global batch size: 16 | lm loss: 7.373695E+00 | loss scale: 16384.0 | grad norm: 75175.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1043/ 159576 | consumed samples: 16688 | elapsed time per iteration (ms): 13721.4 | learning rate: 4.629E-06 | global batch size: 16 | lm loss: 7.642927E+00 | loss scale: 16384.0 | grad norm: 103318.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1044/ 159576 | consumed samples: 16704 | elapsed time per iteration (ms): 13718.3 | learning rate: 4.633E-06 | global batch size: 16 | lm loss: 7.423826E+00 | loss scale: 16384.0 | grad norm: 71060.972 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1045/ 159576 | consumed samples: 16720 | elapsed time per iteration (ms): 13604.4 | learning rate: 4.638E-06 | global batch size: 16 | lm loss: 7.362212E+00 | loss scale: 16384.0 | grad norm: 81169.902 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1046/ 159576 | consumed samples: 16736 | elapsed time per iteration (ms): 14075.1 | learning rate: 4.642E-06 | global batch size: 16 | lm loss: 7.450203E+00 | loss scale: 16384.0 | grad norm: 83510.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1047/ 159576 | consumed samples: 16752 | elapsed time per iteration (ms): 13677.3 | learning rate: 4.646E-06 | global batch size: 16 | lm loss: 7.554290E+00 | loss scale: 16384.0 | grad norm: 81988.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1048/ 159576 | consumed samples: 16768 | elapsed time per iteration (ms): 13606.4 | learning rate: 4.651E-06 | global batch size: 16 | lm loss: 7.327914E+00 | loss scale: 16384.0 | grad norm: 71618.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1049/ 159576 | consumed samples: 16784 | elapsed time per iteration (ms): 13669.1 | learning rate: 4.655E-06 | global batch size: 16 | lm loss: 7.596028E+00 | loss scale: 16384.0 | grad norm: 76665.796 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1050/ 159576 | consumed samples: 16800 | elapsed time per iteration (ms): 13708.7 | learning rate: 4.660E-06 | global batch size: 16 | lm loss: 7.326102E+00 | loss scale: 16384.0 | grad norm: 83331.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1051/ 159576 | consumed samples: 16816 | elapsed time per iteration (ms): 13981.1 | learning rate: 4.664E-06 | global batch size: 16 | lm loss: 7.619492E+00 | loss scale: 16384.0 | grad norm: 82397.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1052/ 159576 | consumed samples: 16832 | elapsed time per iteration (ms): 13516.4 | learning rate: 4.669E-06 | global batch size: 16 | lm loss: 7.530663E+00 | loss scale: 16384.0 | grad norm: 56319.745 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1053/ 159576 | consumed samples: 16848 | elapsed time per iteration (ms): 13647.6 | learning rate: 4.673E-06 | global batch size: 16 | lm loss: 7.443875E+00 | loss scale: 16384.0 | grad norm: 72562.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1054/ 159576 | consumed samples: 16864 | elapsed time per iteration (ms): 13627.5 | learning rate: 4.678E-06 | global batch size: 16 | lm loss: 7.479875E+00 | loss scale: 16384.0 | grad norm: 61495.093 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1055/ 159576 | consumed samples: 16880 | elapsed time per iteration (ms): 14065.0 | learning rate: 4.682E-06 | global batch size: 16 | lm loss: 7.612121E+00 | loss scale: 16384.0 | grad norm: 112310.814 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1056/ 159576 | consumed samples: 16896 | elapsed time per iteration (ms): 13707.4 | learning rate: 4.686E-06 | global batch size: 16 | lm loss: 7.408166E+00 | loss scale: 16384.0 | grad norm: 92018.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1057/ 159576 | consumed samples: 16912 | elapsed time per iteration (ms): 13656.1 | learning rate: 4.691E-06 | global batch size: 16 | lm loss: 7.422934E+00 | loss scale: 16384.0 | grad norm: 67279.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1058/ 159576 | consumed samples: 16928 | elapsed time per iteration (ms): 13676.8 | learning rate: 4.695E-06 | global batch size: 16 | lm loss: 7.397638E+00 | loss scale: 16384.0 | grad norm: 87601.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1059/ 159576 | consumed samples: 16944 | elapsed time per iteration (ms): 14053.0 | learning rate: 4.700E-06 | global batch size: 16 | lm loss: 7.514566E+00 | loss scale: 16384.0 | grad norm: 115639.831 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1060/ 159576 | consumed samples: 16960 | elapsed time per iteration (ms): 13722.6 | learning rate: 4.704E-06 | global batch size: 16 | lm loss: 7.310302E+00 | loss scale: 16384.0 | grad norm: 142865.091 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1061/ 159576 | consumed samples: 16976 | elapsed time per iteration (ms): 13679.9 | learning rate: 4.709E-06 | global batch size: 16 | lm loss: 7.399222E+00 | loss scale: 16384.0 | grad norm: 100646.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1062/ 159576 | consumed samples: 16992 | elapsed time per iteration (ms): 13634.5 | learning rate: 4.713E-06 | global batch size: 16 | lm loss: 7.332808E+00 | loss scale: 16384.0 | grad norm: 66218.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1063/ 159576 | consumed samples: 17008 | elapsed time per iteration (ms): 13663.6 | learning rate: 4.717E-06 | global batch size: 16 | lm loss: 7.490856E+00 | loss scale: 16384.0 | grad norm: 127442.068 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1064/ 159576 | consumed samples: 17024 | elapsed time per iteration (ms): 13909.0 | learning rate: 4.722E-06 | global batch size: 16 | lm loss: 7.693977E+00 | loss scale: 16384.0 | grad norm: 101533.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1065/ 159576 | consumed samples: 17040 | elapsed time per iteration (ms): 13658.8 | learning rate: 4.726E-06 | global batch size: 16 | lm loss: 7.565272E+00 | loss scale: 16384.0 | grad norm: 87035.171 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1066/ 159576 | consumed samples: 17056 | elapsed time per iteration (ms): 13679.2 | learning rate: 4.731E-06 | global batch size: 16 | lm loss: 7.790638E+00 | loss scale: 16384.0 | grad norm: 86411.886 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1067/ 159576 | consumed samples: 17072 | elapsed time per iteration (ms): 13759.2 | learning rate: 4.735E-06 | global batch size: 16 | lm loss: 7.438931E+00 | loss scale: 16384.0 | grad norm: 65756.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1068/ 159576 | consumed samples: 17088 | elapsed time per iteration (ms): 14138.1 | learning rate: 4.740E-06 | global batch size: 16 | lm loss: 7.361547E+00 | loss scale: 16384.0 | grad norm: 130711.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1069/ 159576 | consumed samples: 17104 | elapsed time per iteration (ms): 13687.8 | learning rate: 4.744E-06 | global batch size: 16 | lm loss: 7.413251E+00 | loss scale: 16384.0 | grad norm: 58324.579 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1070/ 159576 | consumed samples: 17120 | elapsed time per iteration (ms): 13637.9 | learning rate: 4.749E-06 | global batch size: 16 | lm loss: 7.397507E+00 | loss scale: 16384.0 | grad norm: 89260.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1071/ 159576 | consumed samples: 17136 | elapsed time per iteration (ms): 13680.2 | learning rate: 4.753E-06 | global batch size: 16 | lm loss: 7.535676E+00 | loss scale: 16384.0 | grad norm: 74408.995 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1072/ 159576 | consumed samples: 17152 | elapsed time per iteration (ms): 14062.2 | learning rate: 4.757E-06 | global batch size: 16 | lm loss: 7.411667E+00 | loss scale: 16384.0 | grad norm: 77225.681 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1073/ 159576 | consumed samples: 17168 | elapsed time per iteration (ms): 13681.2 | learning rate: 4.762E-06 | global batch size: 16 | lm loss: 7.394706E+00 | loss scale: 16384.0 | grad norm: 78590.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1074/ 159576 | consumed samples: 17184 | elapsed time per iteration (ms): 13709.1 | learning rate: 4.766E-06 | global batch size: 16 | lm loss: 7.616404E+00 | loss scale: 16384.0 | grad norm: 82722.799 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1075/ 159576 | consumed samples: 17200 | elapsed time per iteration (ms): 13743.2 | learning rate: 4.771E-06 | global batch size: 16 | lm loss: 7.395072E+00 | loss scale: 16384.0 | grad norm: 63549.807 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1076/ 159576 | consumed samples: 17216 | elapsed time per iteration (ms): 13619.1 | learning rate: 4.775E-06 | global batch size: 16 | lm loss: 7.593513E+00 | loss scale: 16384.0 | grad norm: 100985.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1077/ 159576 | consumed samples: 17232 | elapsed time per iteration (ms): 13859.6 | learning rate: 4.780E-06 | global batch size: 16 | lm loss: 7.379070E+00 | loss scale: 16384.0 | grad norm: 56935.671 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1078/ 159576 | consumed samples: 17248 | elapsed time per iteration (ms): 13589.7 | learning rate: 4.784E-06 | global batch size: 16 | lm loss: 7.412032E+00 | loss scale: 16384.0 | grad norm: 93391.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1079/ 159576 | consumed samples: 17264 | elapsed time per iteration (ms): 13575.0 | learning rate: 4.788E-06 | global batch size: 16 | lm loss: 7.485137E+00 | loss scale: 16384.0 | grad norm: 70759.989 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1080/ 159576 | consumed samples: 17280 | elapsed time per iteration (ms): 13590.9 | learning rate: 4.793E-06 | global batch size: 16 | lm loss: 7.410018E+00 | loss scale: 16384.0 | grad norm: 108070.843 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1081/ 159576 | consumed samples: 17296 | elapsed time per iteration (ms): 13934.8 | learning rate: 4.797E-06 | global batch size: 16 | lm loss: 7.444709E+00 | loss scale: 16384.0 | grad norm: 93912.071 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1082/ 159576 | consumed samples: 17312 | elapsed time per iteration (ms): 13598.4 | learning rate: 4.802E-06 | global batch size: 16 | lm loss: 7.532929E+00 | loss scale: 16384.0 | grad norm: 76683.978 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1083/ 159576 | consumed samples: 17328 | elapsed time per iteration (ms): 13510.5 | learning rate: 4.806E-06 | global batch size: 16 | lm loss: 7.599612E+00 | loss scale: 16384.0 | grad norm: 83858.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1084/ 159576 | consumed samples: 17344 | elapsed time per iteration (ms): 13542.7 | learning rate: 4.811E-06 | global batch size: 16 | lm loss: 7.387773E+00 | loss scale: 16384.0 | grad norm: 63120.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1085/ 159576 | consumed samples: 17360 | elapsed time per iteration (ms): 13555.5 | learning rate: 4.815E-06 | global batch size: 16 | lm loss: 7.289794E+00 | loss scale: 16384.0 | grad norm: 77022.669 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1086/ 159576 | consumed samples: 17376 | elapsed time per iteration (ms): 13932.5 | learning rate: 4.820E-06 | global batch size: 16 | lm loss: 7.393349E+00 | loss scale: 16384.0 | grad norm: 79433.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1087/ 159576 | consumed samples: 17392 | elapsed time per iteration (ms): 13479.9 | learning rate: 4.824E-06 | global batch size: 16 | lm loss: 7.321753E+00 | loss scale: 16384.0 | grad norm: 68970.976 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1088/ 159576 | consumed samples: 17408 | elapsed time per iteration (ms): 13681.0 | learning rate: 4.828E-06 | global batch size: 16 | lm loss: 7.320374E+00 | loss scale: 16384.0 | grad norm: 73549.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1089/ 159576 | consumed samples: 17424 | elapsed time per iteration (ms): 13654.0 | learning rate: 4.833E-06 | global batch size: 16 | lm loss: 7.605762E+00 | loss scale: 16384.0 | grad norm: 80374.482 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1090/ 159576 | consumed samples: 17440 | elapsed time per iteration (ms): 14059.3 | learning rate: 4.837E-06 | global batch size: 16 | lm loss: 7.631133E+00 | loss scale: 16384.0 | grad norm: 82954.080 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1091/ 159576 | consumed samples: 17456 | elapsed time per iteration (ms): 13724.8 | learning rate: 4.842E-06 | global batch size: 16 | lm loss: 7.507143E+00 | loss scale: 16384.0 | grad norm: 60066.048 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1092/ 159576 | consumed samples: 17472 | elapsed time per iteration (ms): 13461.4 | learning rate: 4.846E-06 | global batch size: 16 | lm loss: 7.300464E+00 | loss scale: 16384.0 | grad norm: 116487.793 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1093/ 159576 | consumed samples: 17488 | elapsed time per iteration (ms): 13525.0 | learning rate: 4.851E-06 | global batch size: 16 | lm loss: 7.388405E+00 | loss scale: 16384.0 | grad norm: 79147.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1094/ 159576 | consumed samples: 17504 | elapsed time per iteration (ms): 13950.4 | learning rate: 4.855E-06 | global batch size: 16 | lm loss: 7.471725E+00 | loss scale: 16384.0 | grad norm: 90987.897 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1095/ 159576 | consumed samples: 17520 | elapsed time per iteration (ms): 13624.6 | learning rate: 4.859E-06 | global batch size: 16 | lm loss: 7.530853E+00 | loss scale: 16384.0 | grad norm: 90057.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1096/ 159576 | consumed samples: 17536 | elapsed time per iteration (ms): 13591.9 | learning rate: 4.864E-06 | global batch size: 16 | lm loss: 7.420722E+00 | loss scale: 16384.0 | grad norm: 76037.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1097/ 159576 | consumed samples: 17552 | elapsed time per iteration (ms): 13587.0 | learning rate: 4.868E-06 | global batch size: 16 | lm loss: 7.363769E+00 | loss scale: 16384.0 | grad norm: 107388.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1098/ 159576 | consumed samples: 17568 | elapsed time per iteration (ms): 13667.8 | learning rate: 4.873E-06 | global batch size: 16 | lm loss: 7.310038E+00 | loss scale: 16384.0 | grad norm: 72408.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1099/ 159576 | consumed samples: 17584 | elapsed time per iteration (ms): 13707.4 | learning rate: 4.877E-06 | global batch size: 16 | lm loss: 7.291698E+00 | loss scale: 16384.0 | grad norm: 69292.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1100/ 159576 | consumed samples: 17600 | elapsed time per iteration (ms): 13564.5 | learning rate: 4.882E-06 | global batch size: 16 | lm loss: 7.713614E+00 | loss scale: 16384.0 | grad norm: 87150.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1101/ 159576 | consumed samples: 17616 | elapsed time per iteration (ms): 13621.9 | learning rate: 4.886E-06 | global batch size: 16 | lm loss: 7.482057E+00 | loss scale: 16384.0 | grad norm: 61713.123 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1102/ 159576 | consumed samples: 17632 | elapsed time per iteration (ms): 13628.2 | learning rate: 4.891E-06 | global batch size: 16 | lm loss: 7.370234E+00 | loss scale: 16384.0 | grad norm: 83708.630 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1103/ 159576 | consumed samples: 17648 | elapsed time per iteration (ms): 13962.7 | learning rate: 4.895E-06 | global batch size: 16 | lm loss: 7.373138E+00 | loss scale: 16384.0 | grad norm: 75905.969 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1104/ 159576 | consumed samples: 17664 | elapsed time per iteration (ms): 13627.3 | learning rate: 4.899E-06 | global batch size: 16 | lm loss: 7.448909E+00 | loss scale: 16384.0 | grad norm: 135141.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1105/ 159576 | consumed samples: 17680 | elapsed time per iteration (ms): 13640.6 | learning rate: 4.904E-06 | global batch size: 16 | lm loss: 7.252520E+00 | loss scale: 16384.0 | grad norm: 73661.038 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1106/ 159576 | consumed samples: 17696 | elapsed time per iteration (ms): 13666.3 | learning rate: 4.908E-06 | global batch size: 16 | lm loss: 7.507257E+00 | loss scale: 16384.0 | grad norm: 108098.635 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1107/ 159576 | consumed samples: 17712 | elapsed time per iteration (ms): 13849.3 | learning rate: 4.913E-06 | global batch size: 16 | lm loss: 7.429738E+00 | loss scale: 16384.0 | grad norm: 99851.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1108/ 159576 | consumed samples: 17728 | elapsed time per iteration (ms): 13862.9 | learning rate: 4.917E-06 | global batch size: 16 | lm loss: 7.422798E+00 | loss scale: 16384.0 | grad norm: 90788.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1109/ 159576 | consumed samples: 17744 | elapsed time per iteration (ms): 13640.2 | learning rate: 4.922E-06 | global batch size: 16 | lm loss: 7.656183E+00 | loss scale: 16384.0 | grad norm: 204462.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1110/ 159576 | consumed samples: 17760 | elapsed time per iteration (ms): 13627.0 | learning rate: 4.926E-06 | global batch size: 16 | lm loss: 7.576304E+00 | loss scale: 16384.0 | grad norm: 166002.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1111/ 159576 | consumed samples: 17776 | elapsed time per iteration (ms): 13632.9 | learning rate: 4.930E-06 | global batch size: 16 | lm loss: 7.626440E+00 | loss scale: 16384.0 | grad norm: 82466.643 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1112/ 159576 | consumed samples: 17792 | elapsed time per iteration (ms): 13939.0 | learning rate: 4.935E-06 | global batch size: 16 | lm loss: 7.302793E+00 | loss scale: 16384.0 | grad norm: 150100.520 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1113/ 159576 | consumed samples: 17808 | elapsed time per iteration (ms): 13640.4 | learning rate: 4.939E-06 | global batch size: 16 | lm loss: 7.493092E+00 | loss scale: 16384.0 | grad norm: 104956.045 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1114/ 159576 | consumed samples: 17824 | elapsed time per iteration (ms): 13637.6 | learning rate: 4.944E-06 | global batch size: 16 | lm loss: 7.475542E+00 | loss scale: 16384.0 | grad norm: 86316.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1115/ 159576 | consumed samples: 17840 | elapsed time per iteration (ms): 13630.5 | learning rate: 4.948E-06 | global batch size: 16 | lm loss: 7.367518E+00 | loss scale: 16384.0 | grad norm: 127229.616 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1116/ 159576 | consumed samples: 17856 | elapsed time per iteration (ms): 13929.1 | learning rate: 4.953E-06 | global batch size: 16 | lm loss: 7.463512E+00 | loss scale: 16384.0 | grad norm: 80765.100 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1117/ 159576 | consumed samples: 17872 | elapsed time per iteration (ms): 13651.9 | learning rate: 4.957E-06 | global batch size: 16 | lm loss: 7.389682E+00 | loss scale: 16384.0 | grad norm: 114274.057 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1118/ 159576 | consumed samples: 17888 | elapsed time per iteration (ms): 13673.8 | learning rate: 4.962E-06 | global batch size: 16 | lm loss: 7.446970E+00 | loss scale: 16384.0 | grad norm: 93011.728 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1119/ 159576 | consumed samples: 17904 | elapsed time per iteration (ms): 13700.2 | learning rate: 4.966E-06 | global batch size: 16 | lm loss: 7.314221E+00 | loss scale: 16384.0 | grad norm: 105575.833 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1120/ 159576 | consumed samples: 17920 | elapsed time per iteration (ms): 13702.7 | learning rate: 4.970E-06 | global batch size: 16 | lm loss: 7.372279E+00 | loss scale: 16384.0 | grad norm: 77507.701 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1121/ 159576 | consumed samples: 17936 | elapsed time per iteration (ms): 13869.6 | learning rate: 4.975E-06 | global batch size: 16 | lm loss: 7.535093E+00 | loss scale: 16384.0 | grad norm: 98620.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1122/ 159576 | consumed samples: 17952 | elapsed time per iteration (ms): 13679.6 | learning rate: 4.979E-06 | global batch size: 16 | lm loss: 8.079200E+00 | loss scale: 16384.0 | grad norm: 187332.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1123/ 159576 | consumed samples: 17968 | elapsed time per iteration (ms): 13672.8 | learning rate: 4.984E-06 | global batch size: 16 | lm loss: 7.433456E+00 | loss scale: 16384.0 | grad norm: 139834.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1124/ 159576 | consumed samples: 17984 | elapsed time per iteration (ms): 13651.7 | learning rate: 4.988E-06 | global batch size: 16 | lm loss: 7.440439E+00 | loss scale: 16384.0 | grad norm: 91486.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1125/ 159576 | consumed samples: 18000 | elapsed time per iteration (ms): 14085.1 | learning rate: 4.993E-06 | global batch size: 16 | lm loss: 7.453449E+00 | loss scale: 16384.0 | grad norm: 170685.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1126/ 159576 | consumed samples: 18016 | elapsed time per iteration (ms): 13744.0 | learning rate: 4.997E-06 | global batch size: 16 | lm loss: 7.544756E+00 | loss scale: 16384.0 | grad norm: 93482.948 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1127/ 159576 | consumed samples: 18032 | elapsed time per iteration (ms): 13666.9 | learning rate: 5.001E-06 | global batch size: 16 | lm loss: 7.435877E+00 | loss scale: 16384.0 | grad norm: 98259.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1128/ 159576 | consumed samples: 18048 | elapsed time per iteration (ms): 13692.7 | learning rate: 5.006E-06 | global batch size: 16 | lm loss: 7.496342E+00 | loss scale: 16384.0 | grad norm: 130279.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1129/ 159576 | consumed samples: 18064 | elapsed time per iteration (ms): 14100.4 | learning rate: 5.010E-06 | global batch size: 16 | lm loss: 7.501980E+00 | loss scale: 16384.0 | grad norm: 88561.836 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1130/ 159576 | consumed samples: 18080 | elapsed time per iteration (ms): 13620.7 | learning rate: 5.015E-06 | global batch size: 16 | lm loss: 7.470133E+00 | loss scale: 16384.0 | grad norm: 155289.997 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1131/ 159576 | consumed samples: 18096 | elapsed time per iteration (ms): 13683.0 | learning rate: 5.019E-06 | global batch size: 16 | lm loss: 7.539918E+00 | loss scale: 16384.0 | grad norm: 89135.032 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1132/ 159576 | consumed samples: 18112 | elapsed time per iteration (ms): 13643.2 | learning rate: 5.024E-06 | global batch size: 16 | lm loss: 7.537309E+00 | loss scale: 16384.0 | grad norm: 83460.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1133/ 159576 | consumed samples: 18128 | elapsed time per iteration (ms): 13758.8 | learning rate: 5.028E-06 | global batch size: 16 | lm loss: 7.445082E+00 | loss scale: 16384.0 | grad norm: 97599.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1134/ 159576 | consumed samples: 18144 | elapsed time per iteration (ms): 13842.3 | learning rate: 5.033E-06 | global batch size: 16 | lm loss: 7.533705E+00 | loss scale: 16384.0 | grad norm: 153106.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1135/ 159576 | consumed samples: 18160 | elapsed time per iteration (ms): 13641.3 | learning rate: 5.037E-06 | global batch size: 16 | lm loss: 7.351761E+00 | loss scale: 16384.0 | grad norm: 139552.025 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1136/ 159576 | consumed samples: 18176 | elapsed time per iteration (ms): 13757.6 | learning rate: 5.041E-06 | global batch size: 16 | lm loss: 7.386802E+00 | loss scale: 16384.0 | grad norm: 82271.014 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1137/ 159576 | consumed samples: 18192 | elapsed time per iteration (ms): 13590.7 | learning rate: 5.046E-06 | global batch size: 16 | lm loss: 7.276345E+00 | loss scale: 16384.0 | grad norm: 139306.896 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1138/ 159576 | consumed samples: 18208 | elapsed time per iteration (ms): 14099.6 | learning rate: 5.050E-06 | global batch size: 16 | lm loss: 7.489694E+00 | loss scale: 16384.0 | grad norm: 75568.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1139/ 159576 | consumed samples: 18224 | elapsed time per iteration (ms): 13765.0 | learning rate: 5.055E-06 | global batch size: 16 | lm loss: 6.968816E+00 | loss scale: 16384.0 | grad norm: 118020.093 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1140/ 159576 | consumed samples: 18240 | elapsed time per iteration (ms): 13662.4 | learning rate: 5.059E-06 | global batch size: 16 | lm loss: 7.446542E+00 | loss scale: 16384.0 | grad norm: 117497.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1141/ 159576 | consumed samples: 18256 | elapsed time per iteration (ms): 13747.0 | learning rate: 5.064E-06 | global batch size: 16 | lm loss: 7.328124E+00 | loss scale: 16384.0 | grad norm: 126653.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1142/ 159576 | consumed samples: 18272 | elapsed time per iteration (ms): 14086.2 | learning rate: 5.068E-06 | global batch size: 16 | lm loss: 7.359120E+00 | loss scale: 16384.0 | grad norm: 158587.176 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1143/ 159576 | consumed samples: 18288 | elapsed time per iteration (ms): 13785.6 | learning rate: 5.072E-06 | global batch size: 16 | lm loss: 7.289187E+00 | loss scale: 16384.0 | grad norm: 93193.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1144/ 159576 | consumed samples: 18304 | elapsed time per iteration (ms): 13650.1 | learning rate: 5.077E-06 | global batch size: 16 | lm loss: 7.541381E+00 | loss scale: 16384.0 | grad norm: 127276.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1145/ 159576 | consumed samples: 18320 | elapsed time per iteration (ms): 13673.3 | learning rate: 5.081E-06 | global batch size: 16 | lm loss: 7.343310E+00 | loss scale: 16384.0 | grad norm: 141086.682 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1146/ 159576 | consumed samples: 18336 | elapsed time per iteration (ms): 13709.3 | learning rate: 5.086E-06 | global batch size: 16 | lm loss: 7.291780E+00 | loss scale: 16384.0 | grad norm: 84706.443 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1147/ 159576 | consumed samples: 18352 | elapsed time per iteration (ms): 13798.7 | learning rate: 5.090E-06 | global batch size: 16 | lm loss: 7.395382E+00 | loss scale: 16384.0 | grad norm: 168181.547 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1148/ 159576 | consumed samples: 18368 | elapsed time per iteration (ms): 13678.3 | learning rate: 5.095E-06 | global batch size: 16 | lm loss: 7.287755E+00 | loss scale: 16384.0 | grad norm: 150595.173 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1149/ 159576 | consumed samples: 18384 | elapsed time per iteration (ms): 13705.6 | learning rate: 5.099E-06 | global batch size: 16 | lm loss: 7.521116E+00 | loss scale: 16384.0 | grad norm: 90594.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1150/ 159576 | consumed samples: 18400 | elapsed time per iteration (ms): 13724.2 | learning rate: 5.104E-06 | global batch size: 16 | lm loss: 7.560548E+00 | loss scale: 16384.0 | grad norm: 124093.174 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1151/ 159576 | consumed samples: 18416 | elapsed time per iteration (ms): 14011.4 | learning rate: 5.108E-06 | global batch size: 16 | lm loss: 7.334007E+00 | loss scale: 16384.0 | grad norm: 93590.799 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1152/ 159576 | consumed samples: 18432 | elapsed time per iteration (ms): 13638.1 | learning rate: 5.112E-06 | global batch size: 16 | lm loss: 7.340695E+00 | loss scale: 16384.0 | grad norm: 120515.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1153/ 159576 | consumed samples: 18448 | elapsed time per iteration (ms): 13670.9 | learning rate: 5.117E-06 | global batch size: 16 | lm loss: 7.310359E+00 | loss scale: 16384.0 | grad norm: 121580.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1154/ 159576 | consumed samples: 18464 | elapsed time per iteration (ms): 13692.4 | learning rate: 5.121E-06 | global batch size: 16 | lm loss: 7.407881E+00 | loss scale: 16384.0 | grad norm: 86210.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1155/ 159576 | consumed samples: 18480 | elapsed time per iteration (ms): 14124.7 | learning rate: 5.126E-06 | global batch size: 16 | lm loss: 7.533539E+00 | loss scale: 16384.0 | grad norm: 117499.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1156/ 159576 | consumed samples: 18496 | elapsed time per iteration (ms): 13713.9 | learning rate: 5.130E-06 | global batch size: 16 | lm loss: 7.454373E+00 | loss scale: 16384.0 | grad norm: 82164.881 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1157/ 159576 | consumed samples: 18512 | elapsed time per iteration (ms): 13665.0 | learning rate: 5.135E-06 | global batch size: 16 | lm loss: 6.997806E+00 | loss scale: 16384.0 | grad norm: 118291.842 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1158/ 159576 | consumed samples: 18528 | elapsed time per iteration (ms): 13620.7 | learning rate: 5.139E-06 | global batch size: 16 | lm loss: 7.155181E+00 | loss scale: 16384.0 | grad norm: 80841.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1159/ 159576 | consumed samples: 18544 | elapsed time per iteration (ms): 13522.0 | learning rate: 5.143E-06 | global batch size: 16 | lm loss: 7.303053E+00 | loss scale: 16384.0 | grad norm: 153692.954 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1160/ 159576 | consumed samples: 18560 | elapsed time per iteration (ms): 13934.6 | learning rate: 5.148E-06 | global batch size: 16 | lm loss: 7.453541E+00 | loss scale: 16384.0 | grad norm: 178564.006 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1161/ 159576 | consumed samples: 18576 | elapsed time per iteration (ms): 13591.1 | learning rate: 5.152E-06 | global batch size: 16 | lm loss: 7.370741E+00 | loss scale: 16384.0 | grad norm: 96828.834 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1162/ 159576 | consumed samples: 18592 | elapsed time per iteration (ms): 13610.9 | learning rate: 5.157E-06 | global batch size: 16 | lm loss: 7.395625E+00 | loss scale: 16384.0 | grad norm: 138531.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1163/ 159576 | consumed samples: 18608 | elapsed time per iteration (ms): 13633.4 | learning rate: 5.161E-06 | global batch size: 16 | lm loss: 7.721334E+00 | loss scale: 16384.0 | grad norm: 107198.076 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1164/ 159576 | consumed samples: 18624 | elapsed time per iteration (ms): 13919.7 | learning rate: 5.166E-06 | global batch size: 16 | lm loss: 7.418262E+00 | loss scale: 16384.0 | grad norm: 104593.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1165/ 159576 | consumed samples: 18640 | elapsed time per iteration (ms): 13699.8 | learning rate: 5.170E-06 | global batch size: 16 | lm loss: 7.388452E+00 | loss scale: 16384.0 | grad norm: 87922.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1166/ 159576 | consumed samples: 18656 | elapsed time per iteration (ms): 13567.0 | learning rate: 5.175E-06 | global batch size: 16 | lm loss: 7.359789E+00 | loss scale: 16384.0 | grad norm: 167490.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1167/ 159576 | consumed samples: 18672 | elapsed time per iteration (ms): 13665.3 | learning rate: 5.179E-06 | global batch size: 16 | lm loss: 7.513920E+00 | loss scale: 16384.0 | grad norm: 187148.881 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1168/ 159576 | consumed samples: 18688 | elapsed time per iteration (ms): 13712.9 | learning rate: 5.183E-06 | global batch size: 16 | lm loss: 7.333634E+00 | loss scale: 16384.0 | grad norm: 80524.927 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1169/ 159576 | consumed samples: 18704 | elapsed time per iteration (ms): 13807.4 | learning rate: 5.188E-06 | global batch size: 16 | lm loss: 7.551642E+00 | loss scale: 16384.0 | grad norm: 96715.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1170/ 159576 | consumed samples: 18720 | elapsed time per iteration (ms): 13672.0 | learning rate: 5.192E-06 | global batch size: 16 | lm loss: 7.354926E+00 | loss scale: 16384.0 | grad norm: 108931.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1171/ 159576 | consumed samples: 18736 | elapsed time per iteration (ms): 13735.2 | learning rate: 5.197E-06 | global batch size: 16 | lm loss: 7.360828E+00 | loss scale: 16384.0 | grad norm: 93043.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1172/ 159576 | consumed samples: 18752 | elapsed time per iteration (ms): 13717.8 | learning rate: 5.201E-06 | global batch size: 16 | lm loss: 7.538117E+00 | loss scale: 16384.0 | grad norm: 318365.891 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1173/ 159576 | consumed samples: 18768 | elapsed time per iteration (ms): 13883.3 | learning rate: 5.206E-06 | global batch size: 16 | lm loss: 7.601986E+00 | loss scale: 16384.0 | grad norm: 139775.022 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1174/ 159576 | consumed samples: 18784 | elapsed time per iteration (ms): 13707.5 | learning rate: 5.210E-06 | global batch size: 16 | lm loss: 7.492588E+00 | loss scale: 16384.0 | grad norm: 90689.919 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1175/ 159576 | consumed samples: 18800 | elapsed time per iteration (ms): 13678.7 | learning rate: 5.214E-06 | global batch size: 16 | lm loss: 7.586353E+00 | loss scale: 16384.0 | grad norm: 123587.039 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1176/ 159576 | consumed samples: 18816 | elapsed time per iteration (ms): 13643.8 | learning rate: 5.219E-06 | global batch size: 16 | lm loss: 7.585982E+00 | loss scale: 16384.0 | grad norm: 134121.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1177/ 159576 | consumed samples: 18832 | elapsed time per iteration (ms): 13876.6 | learning rate: 5.223E-06 | global batch size: 16 | lm loss: 7.290177E+00 | loss scale: 16384.0 | grad norm: 61795.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1178/ 159576 | consumed samples: 18848 | elapsed time per iteration (ms): 13887.6 | learning rate: 5.228E-06 | global batch size: 16 | lm loss: 7.394442E+00 | loss scale: 16384.0 | grad norm: 214580.050 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1179/ 159576 | consumed samples: 18864 | elapsed time per iteration (ms): 13671.2 | learning rate: 5.232E-06 | global batch size: 16 | lm loss: 7.342830E+00 | loss scale: 16384.0 | grad norm: 170377.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1180/ 159576 | consumed samples: 18880 | elapsed time per iteration (ms): 13615.6 | learning rate: 5.237E-06 | global batch size: 16 | lm loss: 7.353875E+00 | loss scale: 16384.0 | grad norm: 98364.101 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1181/ 159576 | consumed samples: 18896 | elapsed time per iteration (ms): 13659.2 | learning rate: 5.241E-06 | global batch size: 16 | lm loss: 7.310112E+00 | loss scale: 16384.0 | grad norm: 153347.882 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1182/ 159576 | consumed samples: 18912 | elapsed time per iteration (ms): 13718.2 | learning rate: 5.246E-06 | global batch size: 16 | lm loss: 7.516181E+00 | loss scale: 16384.0 | grad norm: 183425.509 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1183/ 159576 | consumed samples: 18928 | elapsed time per iteration (ms): 13614.7 | learning rate: 5.250E-06 | global batch size: 16 | lm loss: 7.284205E+00 | loss scale: 16384.0 | grad norm: 116539.767 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1184/ 159576 | consumed samples: 18944 | elapsed time per iteration (ms): 13636.1 | learning rate: 5.254E-06 | global batch size: 16 | lm loss: 7.392292E+00 | loss scale: 16384.0 | grad norm: 167498.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1185/ 159576 | consumed samples: 18960 | elapsed time per iteration (ms): 13633.9 | learning rate: 5.259E-06 | global batch size: 16 | lm loss: 7.250909E+00 | loss scale: 16384.0 | grad norm: 100955.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1186/ 159576 | consumed samples: 18976 | elapsed time per iteration (ms): 13999.4 | learning rate: 5.263E-06 | global batch size: 16 | lm loss: 7.536862E+00 | loss scale: 16384.0 | grad norm: 100050.160 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1187/ 159576 | consumed samples: 18992 | elapsed time per iteration (ms): 13653.6 | learning rate: 5.268E-06 | global batch size: 16 | lm loss: 7.565104E+00 | loss scale: 16384.0 | grad norm: 118619.018 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1188/ 159576 | consumed samples: 19008 | elapsed time per iteration (ms): 13606.5 | learning rate: 5.272E-06 | global batch size: 16 | lm loss: 7.258739E+00 | loss scale: 16384.0 | grad norm: 126790.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1189/ 159576 | consumed samples: 19024 | elapsed time per iteration (ms): 13571.9 | learning rate: 5.277E-06 | global batch size: 16 | lm loss: 7.184493E+00 | loss scale: 16384.0 | grad norm: 84818.036 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1190/ 159576 | consumed samples: 19040 | elapsed time per iteration (ms): 13962.3 | learning rate: 5.281E-06 | global batch size: 16 | lm loss: 7.209998E+00 | loss scale: 16384.0 | grad norm: 131280.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1191/ 159576 | consumed samples: 19056 | elapsed time per iteration (ms): 13770.8 | learning rate: 5.286E-06 | global batch size: 16 | lm loss: 7.406217E+00 | loss scale: 16384.0 | grad norm: 110178.484 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1192/ 159576 | consumed samples: 19072 | elapsed time per iteration (ms): 13665.3 | learning rate: 5.290E-06 | global batch size: 16 | lm loss: 7.350411E+00 | loss scale: 16384.0 | grad norm: 81228.032 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1193/ 159576 | consumed samples: 19088 | elapsed time per iteration (ms): 13585.9 | learning rate: 5.294E-06 | global batch size: 16 | lm loss: 7.583058E+00 | loss scale: 16384.0 | grad norm: 291080.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1194/ 159576 | consumed samples: 19104 | elapsed time per iteration (ms): 13658.0 | learning rate: 5.299E-06 | global batch size: 16 | lm loss: 7.808938E+00 | loss scale: 16384.0 | grad norm: 193632.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1195/ 159576 | consumed samples: 19120 | elapsed time per iteration (ms): 13777.0 | learning rate: 5.303E-06 | global batch size: 16 | lm loss: 7.459247E+00 | loss scale: 16384.0 | grad norm: 100738.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1196/ 159576 | consumed samples: 19136 | elapsed time per iteration (ms): 13624.3 | learning rate: 5.308E-06 | global batch size: 16 | lm loss: 7.240894E+00 | loss scale: 16384.0 | grad norm: 102223.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1197/ 159576 | consumed samples: 19152 | elapsed time per iteration (ms): 13630.2 | learning rate: 5.312E-06 | global batch size: 16 | lm loss: 7.469604E+00 | loss scale: 16384.0 | grad norm: 91547.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1198/ 159576 | consumed samples: 19168 | elapsed time per iteration (ms): 13603.4 | learning rate: 5.317E-06 | global batch size: 16 | lm loss: 7.399169E+00 | loss scale: 16384.0 | grad norm: 246196.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1199/ 159576 | consumed samples: 19184 | elapsed time per iteration (ms): 14028.5 | learning rate: 5.321E-06 | global batch size: 16 | lm loss: 7.465099E+00 | loss scale: 16384.0 | grad norm: 185665.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1200/ 159576 | consumed samples: 19200 | elapsed time per iteration (ms): 13601.1 | learning rate: 5.325E-06 | global batch size: 16 | lm loss: 7.383169E+00 | loss scale: 16384.0 | grad norm: 115872.720 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1201/ 159576 | consumed samples: 19216 | elapsed time per iteration (ms): 13566.6 | learning rate: 5.330E-06 | global batch size: 16 | lm loss: 7.352910E+00 | loss scale: 16384.0 | grad norm: 114834.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1202/ 159576 | consumed samples: 19232 | elapsed time per iteration (ms): 13557.4 | learning rate: 5.334E-06 | global batch size: 16 | lm loss: 7.521720E+00 | loss scale: 16384.0 | grad norm: 101976.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1203/ 159576 | consumed samples: 19248 | elapsed time per iteration (ms): 13525.0 | learning rate: 5.339E-06 | global batch size: 16 | lm loss: 7.225696E+00 | loss scale: 16384.0 | grad norm: 178745.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1204/ 159576 | consumed samples: 19264 | elapsed time per iteration (ms): 13539.3 | learning rate: 5.343E-06 | global batch size: 16 | lm loss: 7.375963E+00 | loss scale: 16384.0 | grad norm: 175723.616 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1205/ 159576 | consumed samples: 19280 | elapsed time per iteration (ms): 13532.3 | learning rate: 5.348E-06 | global batch size: 16 | lm loss: 7.402988E+00 | loss scale: 16384.0 | grad norm: 104645.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1206/ 159576 | consumed samples: 19296 | elapsed time per iteration (ms): 13502.9 | learning rate: 5.352E-06 | global batch size: 16 | lm loss: 7.302839E+00 | loss scale: 16384.0 | grad norm: 99328.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1207/ 159576 | consumed samples: 19312 | elapsed time per iteration (ms): 13540.4 | learning rate: 5.357E-06 | global batch size: 16 | lm loss: 7.555269E+00 | loss scale: 16384.0 | grad norm: 89166.858 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1208/ 159576 | consumed samples: 19328 | elapsed time per iteration (ms): 13900.0 | learning rate: 5.361E-06 | global batch size: 16 | lm loss: 7.459805E+00 | loss scale: 16384.0 | grad norm: 135152.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1209/ 159576 | consumed samples: 19344 | elapsed time per iteration (ms): 13560.6 | learning rate: 5.365E-06 | global batch size: 16 | lm loss: 7.419579E+00 | loss scale: 16384.0 | grad norm: 101249.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1210/ 159576 | consumed samples: 19360 | elapsed time per iteration (ms): 13658.8 | learning rate: 5.370E-06 | global batch size: 16 | lm loss: 7.348646E+00 | loss scale: 16384.0 | grad norm: 104483.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1211/ 159576 | consumed samples: 19376 | elapsed time per iteration (ms): 13533.6 | learning rate: 5.374E-06 | global batch size: 16 | lm loss: 7.494230E+00 | loss scale: 16384.0 | grad norm: 110210.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1212/ 159576 | consumed samples: 19392 | elapsed time per iteration (ms): 13905.0 | learning rate: 5.379E-06 | global batch size: 16 | lm loss: 7.390188E+00 | loss scale: 16384.0 | grad norm: 96645.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1213/ 159576 | consumed samples: 19408 | elapsed time per iteration (ms): 13673.2 | learning rate: 5.383E-06 | global batch size: 16 | lm loss: 7.318599E+00 | loss scale: 16384.0 | grad norm: 166216.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1214/ 159576 | consumed samples: 19424 | elapsed time per iteration (ms): 13582.9 | learning rate: 5.388E-06 | global batch size: 16 | lm loss: 7.262068E+00 | loss scale: 16384.0 | grad norm: 75724.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1215/ 159576 | consumed samples: 19440 | elapsed time per iteration (ms): 13570.1 | learning rate: 5.392E-06 | global batch size: 16 | lm loss: 7.594563E+00 | loss scale: 16384.0 | grad norm: 95306.819 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1216/ 159576 | consumed samples: 19456 | elapsed time per iteration (ms): 13639.7 | learning rate: 5.396E-06 | global batch size: 16 | lm loss: 7.375734E+00 | loss scale: 16384.0 | grad norm: 86152.125 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1217/ 159576 | consumed samples: 19472 | elapsed time per iteration (ms): 14091.6 | learning rate: 5.401E-06 | global batch size: 16 | lm loss: 7.213047E+00 | loss scale: 16384.0 | grad norm: 95583.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1218/ 159576 | consumed samples: 19488 | elapsed time per iteration (ms): 13516.3 | learning rate: 5.405E-06 | global batch size: 16 | lm loss: 7.437682E+00 | loss scale: 16384.0 | grad norm: 221549.634 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1219/ 159576 | consumed samples: 19504 | elapsed time per iteration (ms): 13610.0 | learning rate: 5.410E-06 | global batch size: 16 | lm loss: 7.254605E+00 | loss scale: 16384.0 | grad norm: 97554.516 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1220/ 159576 | consumed samples: 19520 | elapsed time per iteration (ms): 13565.5 | learning rate: 5.414E-06 | global batch size: 16 | lm loss: 7.248229E+00 | loss scale: 16384.0 | grad norm: 89138.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1221/ 159576 | consumed samples: 19536 | elapsed time per iteration (ms): 13989.3 | learning rate: 5.419E-06 | global batch size: 16 | lm loss: 7.313151E+00 | loss scale: 16384.0 | grad norm: 172651.828 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1222/ 159576 | consumed samples: 19552 | elapsed time per iteration (ms): 13602.4 | learning rate: 5.423E-06 | global batch size: 16 | lm loss: 7.476789E+00 | loss scale: 16384.0 | grad norm: 67387.822 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1223/ 159576 | consumed samples: 19568 | elapsed time per iteration (ms): 13656.0 | learning rate: 5.428E-06 | global batch size: 16 | lm loss: 7.289939E+00 | loss scale: 16384.0 | grad norm: 207125.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1224/ 159576 | consumed samples: 19584 | elapsed time per iteration (ms): 13537.8 | learning rate: 5.432E-06 | global batch size: 16 | lm loss: 7.409894E+00 | loss scale: 16384.0 | grad norm: 156218.537 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1225/ 159576 | consumed samples: 19600 | elapsed time per iteration (ms): 13600.0 | learning rate: 5.436E-06 | global batch size: 16 | lm loss: 7.226832E+00 | loss scale: 16384.0 | grad norm: 93258.536 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1226/ 159576 | consumed samples: 19616 | elapsed time per iteration (ms): 13778.7 | learning rate: 5.441E-06 | global batch size: 16 | lm loss: 7.406470E+00 | loss scale: 16384.0 | grad norm: 95037.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1227/ 159576 | consumed samples: 19632 | elapsed time per iteration (ms): 13609.5 | learning rate: 5.445E-06 | global batch size: 16 | lm loss: 7.385060E+00 | loss scale: 16384.0 | grad norm: 77831.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1228/ 159576 | consumed samples: 19648 | elapsed time per iteration (ms): 13561.8 | learning rate: 5.450E-06 | global batch size: 16 | lm loss: 7.283795E+00 | loss scale: 16384.0 | grad norm: 219813.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1229/ 159576 | consumed samples: 19664 | elapsed time per iteration (ms): 13619.4 | learning rate: 5.454E-06 | global batch size: 16 | lm loss: 7.344219E+00 | loss scale: 16384.0 | grad norm: 122192.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1230/ 159576 | consumed samples: 19680 | elapsed time per iteration (ms): 14054.6 | learning rate: 5.459E-06 | global batch size: 16 | lm loss: 7.364305E+00 | loss scale: 16384.0 | grad norm: 90944.731 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1231/ 159576 | consumed samples: 19696 | elapsed time per iteration (ms): 13589.9 | learning rate: 5.463E-06 | global batch size: 16 | lm loss: 7.421730E+00 | loss scale: 16384.0 | grad norm: 178816.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1232/ 159576 | consumed samples: 19712 | elapsed time per iteration (ms): 13624.6 | learning rate: 5.467E-06 | global batch size: 16 | lm loss: 7.278720E+00 | loss scale: 16384.0 | grad norm: 101190.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1233/ 159576 | consumed samples: 19728 | elapsed time per iteration (ms): 13574.7 | learning rate: 5.472E-06 | global batch size: 16 | lm loss: 7.525582E+00 | loss scale: 16384.0 | grad norm: 95476.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1234/ 159576 | consumed samples: 19744 | elapsed time per iteration (ms): 13981.0 | learning rate: 5.476E-06 | global batch size: 16 | lm loss: 7.294508E+00 | loss scale: 16384.0 | grad norm: 110379.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1235/ 159576 | consumed samples: 19760 | elapsed time per iteration (ms): 13641.1 | learning rate: 5.481E-06 | global batch size: 16 | lm loss: 7.431972E+00 | loss scale: 16384.0 | grad norm: 103188.497 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1236/ 159576 | consumed samples: 19776 | elapsed time per iteration (ms): 13575.4 | learning rate: 5.485E-06 | global batch size: 16 | lm loss: 7.397687E+00 | loss scale: 16384.0 | grad norm: 92125.975 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1237/ 159576 | consumed samples: 19792 | elapsed time per iteration (ms): 13672.0 | learning rate: 5.490E-06 | global batch size: 16 | lm loss: 7.314774E+00 | loss scale: 16384.0 | grad norm: 75870.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1238/ 159576 | consumed samples: 19808 | elapsed time per iteration (ms): 13509.4 | learning rate: 5.494E-06 | global batch size: 16 | lm loss: 7.187806E+00 | loss scale: 16384.0 | grad norm: 173296.806 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1239/ 159576 | consumed samples: 19824 | elapsed time per iteration (ms): 13875.3 | learning rate: 5.499E-06 | global batch size: 16 | lm loss: 7.376097E+00 | loss scale: 16384.0 | grad norm: 133632.906 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1240/ 159576 | consumed samples: 19840 | elapsed time per iteration (ms): 13610.1 | learning rate: 5.503E-06 | global batch size: 16 | lm loss: 7.267582E+00 | loss scale: 16384.0 | grad norm: 85104.985 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1241/ 159576 | consumed samples: 19856 | elapsed time per iteration (ms): 13551.5 | learning rate: 5.507E-06 | global batch size: 16 | lm loss: 7.352735E+00 | loss scale: 16384.0 | grad norm: 90699.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1242/ 159576 | consumed samples: 19872 | elapsed time per iteration (ms): 13593.9 | learning rate: 5.512E-06 | global batch size: 16 | lm loss: 7.468503E+00 | loss scale: 16384.0 | grad norm: 83188.176 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1243/ 159576 | consumed samples: 19888 | elapsed time per iteration (ms): 13930.9 | learning rate: 5.516E-06 | global batch size: 16 | lm loss: 7.214951E+00 | loss scale: 16384.0 | grad norm: 78366.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1244/ 159576 | consumed samples: 19904 | elapsed time per iteration (ms): 13652.1 | learning rate: 5.521E-06 | global batch size: 16 | lm loss: 7.260246E+00 | loss scale: 16384.0 | grad norm: 80928.941 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 07:03:47] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1162855_[2-10%1] on 'gpu_p13' partition) [2021-09-24 07:03:47] PULSE: tr8-104B is running for 1:11:36 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 1245/ 159576 | consumed samples: 19920 | elapsed time per iteration (ms): 13521.2 | learning rate: 5.525E-06 | global batch size: 16 | lm loss: 7.539850E+00 | loss scale: 16384.0 | grad norm: 85379.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1246/ 159576 | consumed samples: 19936 | elapsed time per iteration (ms): 13540.5 | learning rate: 5.530E-06 | global batch size: 16 | lm loss: 7.541747E+00 | loss scale: 16384.0 | grad norm: 112594.519 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1247/ 159576 | consumed samples: 19952 | elapsed time per iteration (ms): 13599.8 | learning rate: 5.534E-06 | global batch size: 16 | lm loss: 7.427727E+00 | loss scale: 16384.0 | grad norm: 75830.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1248/ 159576 | consumed samples: 19968 | elapsed time per iteration (ms): 13827.8 | learning rate: 5.538E-06 | global batch size: 16 | lm loss: 7.407825E+00 | loss scale: 16384.0 | grad norm: 125194.168 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1249/ 159576 | consumed samples: 19984 | elapsed time per iteration (ms): 13505.2 | learning rate: 5.543E-06 | global batch size: 16 | lm loss: 7.566711E+00 | loss scale: 16384.0 | grad norm: 116825.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1250/ 159576 | consumed samples: 20000 | elapsed time per iteration (ms): 13584.6 | learning rate: 5.547E-06 | global batch size: 16 | lm loss: 7.156756E+00 | loss scale: 16384.0 | grad norm: 75875.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1251/ 159576 | consumed samples: 20016 | elapsed time per iteration (ms): 13599.4 | learning rate: 5.552E-06 | global batch size: 16 | lm loss: 7.355666E+00 | loss scale: 16384.0 | grad norm: 128516.877 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1252/ 159576 | consumed samples: 20032 | elapsed time per iteration (ms): 13882.6 | learning rate: 5.556E-06 | global batch size: 16 | lm loss: 7.339529E+00 | loss scale: 16384.0 | grad norm: 92000.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1253/ 159576 | consumed samples: 20048 | elapsed time per iteration (ms): 13669.5 | learning rate: 5.561E-06 | global batch size: 16 | lm loss: 7.246970E+00 | loss scale: 16384.0 | grad norm: 68938.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1254/ 159576 | consumed samples: 20064 | elapsed time per iteration (ms): 13534.9 | learning rate: 5.565E-06 | global batch size: 16 | lm loss: 7.505607E+00 | loss scale: 16384.0 | grad norm: 103078.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1255/ 159576 | consumed samples: 20080 | elapsed time per iteration (ms): 13594.8 | learning rate: 5.570E-06 | global batch size: 16 | lm loss: 7.386476E+00 | loss scale: 16384.0 | grad norm: 104529.887 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1256/ 159576 | consumed samples: 20096 | elapsed time per iteration (ms): 13795.8 | learning rate: 5.574E-06 | global batch size: 16 | lm loss: 7.263406E+00 | loss scale: 16384.0 | grad norm: 82840.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1257/ 159576 | consumed samples: 20112 | elapsed time per iteration (ms): 13529.7 | learning rate: 5.578E-06 | global batch size: 16 | lm loss: 7.356731E+00 | loss scale: 16384.0 | grad norm: 64612.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1258/ 159576 | consumed samples: 20128 | elapsed time per iteration (ms): 13538.7 | learning rate: 5.583E-06 | global batch size: 16 | lm loss: 7.516888E+00 | loss scale: 16384.0 | grad norm: 136048.030 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1259/ 159576 | consumed samples: 20144 | elapsed time per iteration (ms): 13556.0 | learning rate: 5.587E-06 | global batch size: 16 | lm loss: 7.352553E+00 | loss scale: 16384.0 | grad norm: 81380.126 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1260/ 159576 | consumed samples: 20160 | elapsed time per iteration (ms): 13488.1 | learning rate: 5.592E-06 | global batch size: 16 | lm loss: 7.385587E+00 | loss scale: 16384.0 | grad norm: 121637.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1261/ 159576 | consumed samples: 20176 | elapsed time per iteration (ms): 13803.4 | learning rate: 5.596E-06 | global batch size: 16 | lm loss: 7.280743E+00 | loss scale: 16384.0 | grad norm: 89726.532 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1262/ 159576 | consumed samples: 20192 | elapsed time per iteration (ms): 13426.2 | learning rate: 5.601E-06 | global batch size: 16 | lm loss: 7.512013E+00 | loss scale: 16384.0 | grad norm: 85518.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1263/ 159576 | consumed samples: 20208 | elapsed time per iteration (ms): 13492.1 | learning rate: 5.605E-06 | global batch size: 16 | lm loss: 7.145048E+00 | loss scale: 16384.0 | grad norm: 112279.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1264/ 159576 | consumed samples: 20224 | elapsed time per iteration (ms): 13537.9 | learning rate: 5.609E-06 | global batch size: 16 | lm loss: 7.608912E+00 | loss scale: 16384.0 | grad norm: 96612.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1265/ 159576 | consumed samples: 20240 | elapsed time per iteration (ms): 13857.6 | learning rate: 5.614E-06 | global batch size: 16 | lm loss: 7.316525E+00 | loss scale: 16384.0 | grad norm: 73736.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1266/ 159576 | consumed samples: 20256 | elapsed time per iteration (ms): 13475.3 | learning rate: 5.618E-06 | global batch size: 16 | lm loss: 7.406303E+00 | loss scale: 16384.0 | grad norm: 69485.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1267/ 159576 | consumed samples: 20272 | elapsed time per iteration (ms): 13513.4 | learning rate: 5.623E-06 | global batch size: 16 | lm loss: 7.282144E+00 | loss scale: 16384.0 | grad norm: 72619.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1268/ 159576 | consumed samples: 20288 | elapsed time per iteration (ms): 13517.8 | learning rate: 5.627E-06 | global batch size: 16 | lm loss: 7.419368E+00 | loss scale: 16384.0 | grad norm: 107085.697 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1269/ 159576 | consumed samples: 20304 | elapsed time per iteration (ms): 13507.2 | learning rate: 5.632E-06 | global batch size: 16 | lm loss: 7.427319E+00 | loss scale: 16384.0 | grad norm: 75455.531 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1270/ 159576 | consumed samples: 20320 | elapsed time per iteration (ms): 13744.8 | learning rate: 5.636E-06 | global batch size: 16 | lm loss: 7.348005E+00 | loss scale: 16384.0 | grad norm: 119801.062 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1271/ 159576 | consumed samples: 20336 | elapsed time per iteration (ms): 13569.3 | learning rate: 5.641E-06 | global batch size: 16 | lm loss: 7.365005E+00 | loss scale: 16384.0 | grad norm: 64957.880 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1272/ 159576 | consumed samples: 20352 | elapsed time per iteration (ms): 13569.6 | learning rate: 5.645E-06 | global batch size: 16 | lm loss: 7.429317E+00 | loss scale: 16384.0 | grad norm: 178872.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1273/ 159576 | consumed samples: 20368 | elapsed time per iteration (ms): 13472.8 | learning rate: 5.649E-06 | global batch size: 16 | lm loss: 7.312444E+00 | loss scale: 16384.0 | grad norm: 131489.957 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1274/ 159576 | consumed samples: 20384 | elapsed time per iteration (ms): 14043.7 | learning rate: 5.654E-06 | global batch size: 16 | lm loss: 7.280907E+00 | loss scale: 16384.0 | grad norm: 80742.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1275/ 159576 | consumed samples: 20400 | elapsed time per iteration (ms): 13515.6 | learning rate: 5.658E-06 | global batch size: 16 | lm loss: 7.473969E+00 | loss scale: 16384.0 | grad norm: 192617.575 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1276/ 159576 | consumed samples: 20416 | elapsed time per iteration (ms): 13555.1 | learning rate: 5.663E-06 | global batch size: 16 | lm loss: 7.571683E+00 | loss scale: 16384.0 | grad norm: 142231.827 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1277/ 159576 | consumed samples: 20432 | elapsed time per iteration (ms): 13684.0 | learning rate: 5.667E-06 | global batch size: 16 | lm loss: 7.370350E+00 | loss scale: 16384.0 | grad norm: 91290.772 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1278/ 159576 | consumed samples: 20448 | elapsed time per iteration (ms): 14108.9 | learning rate: 5.672E-06 | global batch size: 16 | lm loss: 7.258504E+00 | loss scale: 16384.0 | grad norm: 111985.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1279/ 159576 | consumed samples: 20464 | elapsed time per iteration (ms): 13599.8 | learning rate: 5.676E-06 | global batch size: 16 | lm loss: 7.378584E+00 | loss scale: 16384.0 | grad norm: 101238.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1280/ 159576 | consumed samples: 20480 | elapsed time per iteration (ms): 13689.3 | learning rate: 5.680E-06 | global batch size: 16 | lm loss: 7.344358E+00 | loss scale: 16384.0 | grad norm: 131175.820 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1281/ 159576 | consumed samples: 20496 | elapsed time per iteration (ms): 13675.0 | learning rate: 5.685E-06 | global batch size: 16 | lm loss: 7.253249E+00 | loss scale: 16384.0 | grad norm: 81245.877 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1282/ 159576 | consumed samples: 20512 | elapsed time per iteration (ms): 13723.8 | learning rate: 5.689E-06 | global batch size: 16 | lm loss: 7.385771E+00 | loss scale: 16384.0 | grad norm: 80281.812 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1283/ 159576 | consumed samples: 20528 | elapsed time per iteration (ms): 13839.8 | learning rate: 5.694E-06 | global batch size: 16 | lm loss: 7.253633E+00 | loss scale: 16384.0 | grad norm: 106168.685 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1284/ 159576 | consumed samples: 20544 | elapsed time per iteration (ms): 13645.0 | learning rate: 5.698E-06 | global batch size: 16 | lm loss: 7.091393E+00 | loss scale: 16384.0 | grad norm: 119249.818 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1285/ 159576 | consumed samples: 20560 | elapsed time per iteration (ms): 13673.3 | learning rate: 5.703E-06 | global batch size: 16 | lm loss: 7.346157E+00 | loss scale: 16384.0 | grad norm: 87118.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1286/ 159576 | consumed samples: 20576 | elapsed time per iteration (ms): 13680.7 | learning rate: 5.707E-06 | global batch size: 16 | lm loss: 7.301017E+00 | loss scale: 16384.0 | grad norm: 66813.094 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1287/ 159576 | consumed samples: 20592 | elapsed time per iteration (ms): 14107.0 | learning rate: 5.712E-06 | global batch size: 16 | lm loss: 7.228415E+00 | loss scale: 16384.0 | grad norm: 90274.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1288/ 159576 | consumed samples: 20608 | elapsed time per iteration (ms): 13593.6 | learning rate: 5.716E-06 | global batch size: 16 | lm loss: 7.412420E+00 | loss scale: 16384.0 | grad norm: 74854.970 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1289/ 159576 | consumed samples: 20624 | elapsed time per iteration (ms): 13657.4 | learning rate: 5.720E-06 | global batch size: 16 | lm loss: 7.296477E+00 | loss scale: 16384.0 | grad norm: 78756.807 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1290/ 159576 | consumed samples: 20640 | elapsed time per iteration (ms): 13628.7 | learning rate: 5.725E-06 | global batch size: 16 | lm loss: 7.091270E+00 | loss scale: 16384.0 | grad norm: 77550.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1291/ 159576 | consumed samples: 20656 | elapsed time per iteration (ms): 13654.9 | learning rate: 5.729E-06 | global batch size: 16 | lm loss: 7.247941E+00 | loss scale: 16384.0 | grad norm: 140565.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1292/ 159576 | consumed samples: 20672 | elapsed time per iteration (ms): 13789.5 | learning rate: 5.734E-06 | global batch size: 16 | lm loss: 7.326149E+00 | loss scale: 16384.0 | grad norm: 66170.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1293/ 159576 | consumed samples: 20688 | elapsed time per iteration (ms): 13629.3 | learning rate: 5.738E-06 | global batch size: 16 | lm loss: 7.358797E+00 | loss scale: 16384.0 | grad norm: 94692.189 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1294/ 159576 | consumed samples: 20704 | elapsed time per iteration (ms): 13584.0 | learning rate: 5.743E-06 | global batch size: 16 | lm loss: 7.254357E+00 | loss scale: 16384.0 | grad norm: 69169.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1295/ 159576 | consumed samples: 20720 | elapsed time per iteration (ms): 13612.6 | learning rate: 5.747E-06 | global batch size: 16 | lm loss: 7.449785E+00 | loss scale: 16384.0 | grad norm: 180039.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1296/ 159576 | consumed samples: 20736 | elapsed time per iteration (ms): 13948.4 | learning rate: 5.751E-06 | global batch size: 16 | lm loss: 7.506041E+00 | loss scale: 16384.0 | grad norm: 147606.074 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1297/ 159576 | consumed samples: 20752 | elapsed time per iteration (ms): 13604.2 | learning rate: 5.756E-06 | global batch size: 16 | lm loss: 7.265352E+00 | loss scale: 16384.0 | grad norm: 87511.848 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1298/ 159576 | consumed samples: 20768 | elapsed time per iteration (ms): 13622.0 | learning rate: 5.760E-06 | global batch size: 16 | lm loss: 7.446327E+00 | loss scale: 16384.0 | grad norm: 91155.668 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1299/ 159576 | consumed samples: 20784 | elapsed time per iteration (ms): 13674.5 | learning rate: 5.765E-06 | global batch size: 16 | lm loss: 7.469901E+00 | loss scale: 16384.0 | grad norm: 219048.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1300/ 159576 | consumed samples: 20800 | elapsed time per iteration (ms): 13848.4 | learning rate: 5.769E-06 | global batch size: 16 | lm loss: 7.389014E+00 | loss scale: 16384.0 | grad norm: 84402.094 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1301/ 159576 | consumed samples: 20816 | elapsed time per iteration (ms): 13625.0 | learning rate: 5.774E-06 | global batch size: 16 | lm loss: 7.303530E+00 | loss scale: 16384.0 | grad norm: 174901.504 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1302/ 159576 | consumed samples: 20832 | elapsed time per iteration (ms): 13624.5 | learning rate: 5.778E-06 | global batch size: 16 | lm loss: 7.358258E+00 | loss scale: 16384.0 | grad norm: 146018.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1303/ 159576 | consumed samples: 20848 | elapsed time per iteration (ms): 13602.8 | learning rate: 5.783E-06 | global batch size: 16 | lm loss: 7.337800E+00 | loss scale: 16384.0 | grad norm: 109327.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1304/ 159576 | consumed samples: 20864 | elapsed time per iteration (ms): 13628.1 | learning rate: 5.787E-06 | global batch size: 16 | lm loss: 7.310088E+00 | loss scale: 16384.0 | grad norm: 83547.733 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1305/ 159576 | consumed samples: 20880 | elapsed time per iteration (ms): 13754.8 | learning rate: 5.791E-06 | global batch size: 16 | lm loss: 7.464965E+00 | loss scale: 16384.0 | grad norm: 695515.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1306/ 159576 | consumed samples: 20896 | elapsed time per iteration (ms): 13652.7 | learning rate: 5.796E-06 | global batch size: 16 | lm loss: 7.764376E+00 | loss scale: 16384.0 | grad norm: 569876.871 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1307/ 159576 | consumed samples: 20912 | elapsed time per iteration (ms): 13609.0 | learning rate: 5.800E-06 | global batch size: 16 | lm loss: 7.550226E+00 | loss scale: 16384.0 | grad norm: 356748.186 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1308/ 159576 | consumed samples: 20928 | elapsed time per iteration (ms): 13602.6 | learning rate: 5.805E-06 | global batch size: 16 | lm loss: 7.402792E+00 | loss scale: 16384.0 | grad norm: 159267.929 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1309/ 159576 | consumed samples: 20944 | elapsed time per iteration (ms): 13968.8 | learning rate: 5.809E-06 | global batch size: 16 | lm loss: 7.204682E+00 | loss scale: 16384.0 | grad norm: 129995.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1310/ 159576 | consumed samples: 20960 | elapsed time per iteration (ms): 13646.5 | learning rate: 5.814E-06 | global batch size: 16 | lm loss: 7.591084E+00 | loss scale: 16384.0 | grad norm: 143380.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1311/ 159576 | consumed samples: 20976 | elapsed time per iteration (ms): 13595.1 | learning rate: 5.818E-06 | global batch size: 16 | lm loss: 7.316426E+00 | loss scale: 16384.0 | grad norm: 150593.992 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1312/ 159576 | consumed samples: 20992 | elapsed time per iteration (ms): 13595.5 | learning rate: 5.822E-06 | global batch size: 16 | lm loss: 7.305964E+00 | loss scale: 16384.0 | grad norm: 177049.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1313/ 159576 | consumed samples: 21008 | elapsed time per iteration (ms): 13979.9 | learning rate: 5.827E-06 | global batch size: 16 | lm loss: 7.567747E+00 | loss scale: 16384.0 | grad norm: 169809.702 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1314/ 159576 | consumed samples: 21024 | elapsed time per iteration (ms): 13640.7 | learning rate: 5.831E-06 | global batch size: 16 | lm loss: 7.395080E+00 | loss scale: 16384.0 | grad norm: 145564.791 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1315/ 159576 | consumed samples: 21040 | elapsed time per iteration (ms): 13592.0 | learning rate: 5.836E-06 | global batch size: 16 | lm loss: 7.317047E+00 | loss scale: 16384.0 | grad norm: 104694.703 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1316/ 159576 | consumed samples: 21056 | elapsed time per iteration (ms): 13586.9 | learning rate: 5.840E-06 | global batch size: 16 | lm loss: 7.255484E+00 | loss scale: 16384.0 | grad norm: 93976.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1317/ 159576 | consumed samples: 21072 | elapsed time per iteration (ms): 13589.9 | learning rate: 5.845E-06 | global batch size: 16 | lm loss: 7.440733E+00 | loss scale: 16384.0 | grad norm: 181969.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1318/ 159576 | consumed samples: 21088 | elapsed time per iteration (ms): 13777.5 | learning rate: 5.849E-06 | global batch size: 16 | lm loss: 7.425194E+00 | loss scale: 16384.0 | grad norm: 109784.173 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1319/ 159576 | consumed samples: 21104 | elapsed time per iteration (ms): 13622.9 | learning rate: 5.854E-06 | global batch size: 16 | lm loss: 7.338997E+00 | loss scale: 16384.0 | grad norm: 146618.704 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1320/ 159576 | consumed samples: 21120 | elapsed time per iteration (ms): 13655.9 | learning rate: 5.858E-06 | global batch size: 16 | lm loss: 7.517268E+00 | loss scale: 16384.0 | grad norm: 108508.882 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1321/ 159576 | consumed samples: 21136 | elapsed time per iteration (ms): 13535.6 | learning rate: 5.862E-06 | global batch size: 16 | lm loss: 7.358712E+00 | loss scale: 16384.0 | grad norm: 100699.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1322/ 159576 | consumed samples: 21152 | elapsed time per iteration (ms): 13935.1 | learning rate: 5.867E-06 | global batch size: 16 | lm loss: 7.184452E+00 | loss scale: 16384.0 | grad norm: 85896.066 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1323/ 159576 | consumed samples: 21168 | elapsed time per iteration (ms): 13612.2 | learning rate: 5.871E-06 | global batch size: 16 | lm loss: 7.388680E+00 | loss scale: 16384.0 | grad norm: 283765.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1324/ 159576 | consumed samples: 21184 | elapsed time per iteration (ms): 13600.2 | learning rate: 5.876E-06 | global batch size: 16 | lm loss: 7.594103E+00 | loss scale: 16384.0 | grad norm: 191758.573 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1325/ 159576 | consumed samples: 21200 | elapsed time per iteration (ms): 13592.0 | learning rate: 5.880E-06 | global batch size: 16 | lm loss: 7.443296E+00 | loss scale: 16384.0 | grad norm: 112255.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1326/ 159576 | consumed samples: 21216 | elapsed time per iteration (ms): 13594.2 | learning rate: 5.885E-06 | global batch size: 16 | lm loss: 7.192332E+00 | loss scale: 16384.0 | grad norm: 110320.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1327/ 159576 | consumed samples: 21232 | elapsed time per iteration (ms): 13762.8 | learning rate: 5.889E-06 | global batch size: 16 | lm loss: 8.096416E+00 | loss scale: 16384.0 | grad norm: 131448.164 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1328/ 159576 | consumed samples: 21248 | elapsed time per iteration (ms): 13579.8 | learning rate: 5.893E-06 | global batch size: 16 | lm loss: 7.433802E+00 | loss scale: 16384.0 | grad norm: 182837.970 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1329/ 159576 | consumed samples: 21264 | elapsed time per iteration (ms): 13581.7 | learning rate: 5.898E-06 | global batch size: 16 | lm loss: 7.172110E+00 | loss scale: 16384.0 | grad norm: 100348.173 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1330/ 159576 | consumed samples: 21280 | elapsed time per iteration (ms): 13583.6 | learning rate: 5.902E-06 | global batch size: 16 | lm loss: 7.240623E+00 | loss scale: 16384.0 | grad norm: 100150.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1331/ 159576 | consumed samples: 21296 | elapsed time per iteration (ms): 14102.4 | learning rate: 5.907E-06 | global batch size: 16 | lm loss: 7.203824E+00 | loss scale: 16384.0 | grad norm: 241560.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1332/ 159576 | consumed samples: 21312 | elapsed time per iteration (ms): 13644.3 | learning rate: 5.911E-06 | global batch size: 16 | lm loss: 7.245723E+00 | loss scale: 16384.0 | grad norm: 129411.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1333/ 159576 | consumed samples: 21328 | elapsed time per iteration (ms): 13656.6 | learning rate: 5.916E-06 | global batch size: 16 | lm loss: 7.574631E+00 | loss scale: 16384.0 | grad norm: 172987.034 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1334/ 159576 | consumed samples: 21344 | elapsed time per iteration (ms): 13588.8 | learning rate: 5.920E-06 | global batch size: 16 | lm loss: 7.287757E+00 | loss scale: 16384.0 | grad norm: 99651.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1335/ 159576 | consumed samples: 21360 | elapsed time per iteration (ms): 14011.8 | learning rate: 5.925E-06 | global batch size: 16 | lm loss: 7.268057E+00 | loss scale: 16384.0 | grad norm: 109280.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1336/ 159576 | consumed samples: 21376 | elapsed time per iteration (ms): 13624.4 | learning rate: 5.929E-06 | global batch size: 16 | lm loss: 7.062439E+00 | loss scale: 16384.0 | grad norm: 160438.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1337/ 159576 | consumed samples: 21392 | elapsed time per iteration (ms): 13544.1 | learning rate: 5.933E-06 | global batch size: 16 | lm loss: 7.233086E+00 | loss scale: 16384.0 | grad norm: 175313.966 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1338/ 159576 | consumed samples: 21408 | elapsed time per iteration (ms): 13619.6 | learning rate: 5.938E-06 | global batch size: 16 | lm loss: 7.333053E+00 | loss scale: 16384.0 | grad norm: 104091.148 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1339/ 159576 | consumed samples: 21424 | elapsed time per iteration (ms): 13622.4 | learning rate: 5.942E-06 | global batch size: 16 | lm loss: 7.263519E+00 | loss scale: 16384.0 | grad norm: 90175.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1340/ 159576 | consumed samples: 21440 | elapsed time per iteration (ms): 13736.6 | learning rate: 5.947E-06 | global batch size: 16 | lm loss: 7.445864E+00 | loss scale: 16384.0 | grad norm: 136689.970 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1341/ 159576 | consumed samples: 21456 | elapsed time per iteration (ms): 13686.3 | learning rate: 5.951E-06 | global batch size: 16 | lm loss: 7.362231E+00 | loss scale: 16384.0 | grad norm: 184602.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1342/ 159576 | consumed samples: 21472 | elapsed time per iteration (ms): 13488.8 | learning rate: 5.956E-06 | global batch size: 16 | lm loss: 7.368071E+00 | loss scale: 16384.0 | grad norm: 82633.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1343/ 159576 | consumed samples: 21488 | elapsed time per iteration (ms): 13605.8 | learning rate: 5.960E-06 | global batch size: 16 | lm loss: 7.327272E+00 | loss scale: 16384.0 | grad norm: 92741.507 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1344/ 159576 | consumed samples: 21504 | elapsed time per iteration (ms): 14069.0 | learning rate: 5.964E-06 | global batch size: 16 | lm loss: 7.323634E+00 | loss scale: 16384.0 | grad norm: 99780.106 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1345/ 159576 | consumed samples: 21520 | elapsed time per iteration (ms): 13450.7 | learning rate: 5.969E-06 | global batch size: 16 | lm loss: 7.741362E+00 | loss scale: 16384.0 | grad norm: 105396.793 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1346/ 159576 | consumed samples: 21536 | elapsed time per iteration (ms): 13598.3 | learning rate: 5.973E-06 | global batch size: 16 | lm loss: 7.280247E+00 | loss scale: 16384.0 | grad norm: 77724.692 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1347/ 159576 | consumed samples: 21552 | elapsed time per iteration (ms): 13585.6 | learning rate: 5.978E-06 | global batch size: 16 | lm loss: 7.398378E+00 | loss scale: 16384.0 | grad norm: 69954.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1348/ 159576 | consumed samples: 21568 | elapsed time per iteration (ms): 13610.3 | learning rate: 5.982E-06 | global batch size: 16 | lm loss: 7.321609E+00 | loss scale: 16384.0 | grad norm: 94086.734 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1349/ 159576 | consumed samples: 21584 | elapsed time per iteration (ms): 13777.1 | learning rate: 5.987E-06 | global batch size: 16 | lm loss: 7.188628E+00 | loss scale: 16384.0 | grad norm: 81475.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1350/ 159576 | consumed samples: 21600 | elapsed time per iteration (ms): 13566.9 | learning rate: 5.991E-06 | global batch size: 16 | lm loss: 7.515175E+00 | loss scale: 16384.0 | grad norm: 78780.993 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1351/ 159576 | consumed samples: 21616 | elapsed time per iteration (ms): 13622.9 | learning rate: 5.996E-06 | global batch size: 16 | lm loss: 7.231083E+00 | loss scale: 16384.0 | grad norm: 86153.703 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1352/ 159576 | consumed samples: 21632 | elapsed time per iteration (ms): 13562.3 | learning rate: 6.000E-06 | global batch size: 16 | lm loss: 7.206710E+00 | loss scale: 16384.0 | grad norm: 83949.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1353/ 159576 | consumed samples: 21648 | elapsed time per iteration (ms): 13968.8 | learning rate: 6.004E-06 | global batch size: 16 | lm loss: 7.293135E+00 | loss scale: 16384.0 | grad norm: 83956.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1354/ 159576 | consumed samples: 21664 | elapsed time per iteration (ms): 13680.7 | learning rate: 6.009E-06 | global batch size: 16 | lm loss: 7.282973E+00 | loss scale: 16384.0 | grad norm: 102770.063 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1355/ 159576 | consumed samples: 21680 | elapsed time per iteration (ms): 13601.4 | learning rate: 6.013E-06 | global batch size: 16 | lm loss: 7.427012E+00 | loss scale: 16384.0 | grad norm: 87455.923 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1356/ 159576 | consumed samples: 21696 | elapsed time per iteration (ms): 13542.1 | learning rate: 6.018E-06 | global batch size: 16 | lm loss: 7.529208E+00 | loss scale: 16384.0 | grad norm: 83130.183 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1357/ 159576 | consumed samples: 21712 | elapsed time per iteration (ms): 13961.0 | learning rate: 6.022E-06 | global batch size: 16 | lm loss: 7.327049E+00 | loss scale: 16384.0 | grad norm: 77841.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1358/ 159576 | consumed samples: 21728 | elapsed time per iteration (ms): 13587.5 | learning rate: 6.027E-06 | global batch size: 16 | lm loss: 7.267120E+00 | loss scale: 16384.0 | grad norm: 86295.759 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1359/ 159576 | consumed samples: 21744 | elapsed time per iteration (ms): 13505.9 | learning rate: 6.031E-06 | global batch size: 16 | lm loss: 7.190462E+00 | loss scale: 16384.0 | grad norm: 154865.118 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1360/ 159576 | consumed samples: 21760 | elapsed time per iteration (ms): 13616.0 | learning rate: 6.036E-06 | global batch size: 16 | lm loss: 7.321602E+00 | loss scale: 16384.0 | grad norm: 112461.941 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1361/ 159576 | consumed samples: 21776 | elapsed time per iteration (ms): 13547.3 | learning rate: 6.040E-06 | global batch size: 16 | lm loss: 7.145373E+00 | loss scale: 16384.0 | grad norm: 72055.762 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1362/ 159576 | consumed samples: 21792 | elapsed time per iteration (ms): 13692.3 | learning rate: 6.044E-06 | global batch size: 16 | lm loss: 7.077173E+00 | loss scale: 16384.0 | grad norm: 103896.131 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1363/ 159576 | consumed samples: 21808 | elapsed time per iteration (ms): 13612.5 | learning rate: 6.049E-06 | global batch size: 16 | lm loss: 7.245114E+00 | loss scale: 16384.0 | grad norm: 79354.159 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1364/ 159576 | consumed samples: 21824 | elapsed time per iteration (ms): 13541.3 | learning rate: 6.053E-06 | global batch size: 16 | lm loss: 7.281060E+00 | loss scale: 16384.0 | grad norm: 148274.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1365/ 159576 | consumed samples: 21840 | elapsed time per iteration (ms): 13609.2 | learning rate: 6.058E-06 | global batch size: 16 | lm loss: 7.401906E+00 | loss scale: 16384.0 | grad norm: 119123.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1366/ 159576 | consumed samples: 21856 | elapsed time per iteration (ms): 13916.7 | learning rate: 6.062E-06 | global batch size: 16 | lm loss: 7.338102E+00 | loss scale: 16384.0 | grad norm: 93708.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1367/ 159576 | consumed samples: 21872 | elapsed time per iteration (ms): 13536.5 | learning rate: 6.067E-06 | global batch size: 16 | lm loss: 7.494397E+00 | loss scale: 16384.0 | grad norm: 130779.852 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1368/ 159576 | consumed samples: 21888 | elapsed time per iteration (ms): 13577.1 | learning rate: 6.071E-06 | global batch size: 16 | lm loss: 7.007359E+00 | loss scale: 16384.0 | grad norm: 94271.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1369/ 159576 | consumed samples: 21904 | elapsed time per iteration (ms): 13571.4 | learning rate: 6.075E-06 | global batch size: 16 | lm loss: 7.129241E+00 | loss scale: 16384.0 | grad norm: 129962.794 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1370/ 159576 | consumed samples: 21920 | elapsed time per iteration (ms): 13603.2 | learning rate: 6.080E-06 | global batch size: 16 | lm loss: 7.323318E+00 | loss scale: 16384.0 | grad norm: 138541.774 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1371/ 159576 | consumed samples: 21936 | elapsed time per iteration (ms): 13998.6 | learning rate: 6.084E-06 | global batch size: 16 | lm loss: 7.164912E+00 | loss scale: 16384.0 | grad norm: 95366.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1372/ 159576 | consumed samples: 21952 | elapsed time per iteration (ms): 13587.8 | learning rate: 6.089E-06 | global batch size: 16 | lm loss: 7.207436E+00 | loss scale: 16384.0 | grad norm: 95481.009 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1373/ 159576 | consumed samples: 21968 | elapsed time per iteration (ms): 13570.1 | learning rate: 6.093E-06 | global batch size: 16 | lm loss: 7.245305E+00 | loss scale: 16384.0 | grad norm: 110814.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1374/ 159576 | consumed samples: 21984 | elapsed time per iteration (ms): 13553.5 | learning rate: 6.098E-06 | global batch size: 16 | lm loss: 7.184179E+00 | loss scale: 16384.0 | grad norm: 92107.034 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1375/ 159576 | consumed samples: 22000 | elapsed time per iteration (ms): 13994.4 | learning rate: 6.102E-06 | global batch size: 16 | lm loss: 7.117487E+00 | loss scale: 16384.0 | grad norm: 77237.913 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1376/ 159576 | consumed samples: 22016 | elapsed time per iteration (ms): 13625.6 | learning rate: 6.107E-06 | global batch size: 16 | lm loss: 7.445632E+00 | loss scale: 16384.0 | grad norm: 139111.184 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1377/ 159576 | consumed samples: 22032 | elapsed time per iteration (ms): 13559.3 | learning rate: 6.111E-06 | global batch size: 16 | lm loss: 7.513434E+00 | loss scale: 16384.0 | grad norm: 111307.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1378/ 159576 | consumed samples: 22048 | elapsed time per iteration (ms): 13608.4 | learning rate: 6.115E-06 | global batch size: 16 | lm loss: 7.255265E+00 | loss scale: 16384.0 | grad norm: 88273.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1379/ 159576 | consumed samples: 22064 | elapsed time per iteration (ms): 14048.5 | learning rate: 6.120E-06 | global batch size: 16 | lm loss: 7.123577E+00 | loss scale: 16384.0 | grad norm: 85346.614 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1380/ 159576 | consumed samples: 22080 | elapsed time per iteration (ms): 13485.1 | learning rate: 6.124E-06 | global batch size: 16 | lm loss: 7.134797E+00 | loss scale: 16384.0 | grad norm: 118284.165 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1381/ 159576 | consumed samples: 22096 | elapsed time per iteration (ms): 13616.6 | learning rate: 6.129E-06 | global batch size: 16 | lm loss: 7.281054E+00 | loss scale: 16384.0 | grad norm: 88229.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1382/ 159576 | consumed samples: 22112 | elapsed time per iteration (ms): 13576.6 | learning rate: 6.133E-06 | global batch size: 16 | lm loss: 7.397271E+00 | loss scale: 16384.0 | grad norm: 130821.847 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1383/ 159576 | consumed samples: 22128 | elapsed time per iteration (ms): 13587.8 | learning rate: 6.138E-06 | global batch size: 16 | lm loss: 7.362026E+00 | loss scale: 16384.0 | grad norm: 83450.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1384/ 159576 | consumed samples: 22144 | elapsed time per iteration (ms): 13848.8 | learning rate: 6.142E-06 | global batch size: 16 | lm loss: 7.275143E+00 | loss scale: 16384.0 | grad norm: 86287.774 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1385/ 159576 | consumed samples: 22160 | elapsed time per iteration (ms): 13576.9 | learning rate: 6.146E-06 | global batch size: 16 | lm loss: 7.400926E+00 | loss scale: 16384.0 | grad norm: 98321.914 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1386/ 159576 | consumed samples: 22176 | elapsed time per iteration (ms): 13627.2 | learning rate: 6.151E-06 | global batch size: 16 | lm loss: 7.151899E+00 | loss scale: 16384.0 | grad norm: 85060.501 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1387/ 159576 | consumed samples: 22192 | elapsed time per iteration (ms): 13519.4 | learning rate: 6.155E-06 | global batch size: 16 | lm loss: 7.335835E+00 | loss scale: 16384.0 | grad norm: 64450.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1388/ 159576 | consumed samples: 22208 | elapsed time per iteration (ms): 13906.1 | learning rate: 6.160E-06 | global batch size: 16 | lm loss: 7.316273E+00 | loss scale: 16384.0 | grad norm: 66517.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1389/ 159576 | consumed samples: 22224 | elapsed time per iteration (ms): 13589.2 | learning rate: 6.164E-06 | global batch size: 16 | lm loss: 7.190707E+00 | loss scale: 16384.0 | grad norm: 123710.931 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1390/ 159576 | consumed samples: 22240 | elapsed time per iteration (ms): 13545.5 | learning rate: 6.169E-06 | global batch size: 16 | lm loss: 7.337936E+00 | loss scale: 16384.0 | grad norm: 78178.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1391/ 159576 | consumed samples: 22256 | elapsed time per iteration (ms): 13564.6 | learning rate: 6.173E-06 | global batch size: 16 | lm loss: 7.539785E+00 | loss scale: 16384.0 | grad norm: 111563.102 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1392/ 159576 | consumed samples: 22272 | elapsed time per iteration (ms): 13891.4 | learning rate: 6.178E-06 | global batch size: 16 | lm loss: 7.071362E+00 | loss scale: 16384.0 | grad norm: 70647.575 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1393/ 159576 | consumed samples: 22288 | elapsed time per iteration (ms): 13681.2 | learning rate: 6.182E-06 | global batch size: 16 | lm loss: 7.133610E+00 | loss scale: 16384.0 | grad norm: 124103.863 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1394/ 159576 | consumed samples: 22304 | elapsed time per iteration (ms): 13531.0 | learning rate: 6.186E-06 | global batch size: 16 | lm loss: 7.323411E+00 | loss scale: 16384.0 | grad norm: 99951.813 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1395/ 159576 | consumed samples: 22320 | elapsed time per iteration (ms): 13568.0 | learning rate: 6.191E-06 | global batch size: 16 | lm loss: 7.184701E+00 | loss scale: 16384.0 | grad norm: 71905.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1396/ 159576 | consumed samples: 22336 | elapsed time per iteration (ms): 13541.4 | learning rate: 6.195E-06 | global batch size: 16 | lm loss: 7.166233E+00 | loss scale: 16384.0 | grad norm: 81874.132 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1397/ 159576 | consumed samples: 22352 | elapsed time per iteration (ms): 13897.4 | learning rate: 6.200E-06 | global batch size: 16 | lm loss: 7.247505E+00 | loss scale: 16384.0 | grad norm: 84059.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1398/ 159576 | consumed samples: 22368 | elapsed time per iteration (ms): 13621.5 | learning rate: 6.204E-06 | global batch size: 16 | lm loss: 7.240150E+00 | loss scale: 16384.0 | grad norm: 119489.831 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1399/ 159576 | consumed samples: 22384 | elapsed time per iteration (ms): 13579.9 | learning rate: 6.209E-06 | global batch size: 16 | lm loss: 7.294222E+00 | loss scale: 16384.0 | grad norm: 80417.137 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1400/ 159576 | consumed samples: 22400 | elapsed time per iteration (ms): 13625.0 | learning rate: 6.213E-06 | global batch size: 16 | lm loss: 7.203695E+00 | loss scale: 16384.0 | grad norm: 97654.667 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1401/ 159576 | consumed samples: 22416 | elapsed time per iteration (ms): 14002.5 | learning rate: 6.217E-06 | global batch size: 16 | lm loss: 7.173908E+00 | loss scale: 16384.0 | grad norm: 72597.723 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1402/ 159576 | consumed samples: 22432 | elapsed time per iteration (ms): 13559.2 | learning rate: 6.222E-06 | global batch size: 16 | lm loss: 7.213487E+00 | loss scale: 16384.0 | grad norm: 108337.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1403/ 159576 | consumed samples: 22448 | elapsed time per iteration (ms): 13615.0 | learning rate: 6.226E-06 | global batch size: 16 | lm loss: 7.295056E+00 | loss scale: 16384.0 | grad norm: 109464.933 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1404/ 159576 | consumed samples: 22464 | elapsed time per iteration (ms): 13479.3 | learning rate: 6.231E-06 | global batch size: 16 | lm loss: 7.070762E+00 | loss scale: 16384.0 | grad norm: 70008.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1405/ 159576 | consumed samples: 22480 | elapsed time per iteration (ms): 13573.2 | learning rate: 6.235E-06 | global batch size: 16 | lm loss: 7.206651E+00 | loss scale: 16384.0 | grad norm: 71456.680 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1406/ 159576 | consumed samples: 22496 | elapsed time per iteration (ms): 13670.7 | learning rate: 6.240E-06 | global batch size: 16 | lm loss: 7.421339E+00 | loss scale: 16384.0 | grad norm: 81529.039 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1407/ 159576 | consumed samples: 22512 | elapsed time per iteration (ms): 13510.9 | learning rate: 6.244E-06 | global batch size: 16 | lm loss: 7.245395E+00 | loss scale: 16384.0 | grad norm: 120780.179 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1408/ 159576 | consumed samples: 22528 | elapsed time per iteration (ms): 13544.4 | learning rate: 6.249E-06 | global batch size: 16 | lm loss: 7.479702E+00 | loss scale: 16384.0 | grad norm: 98091.848 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1409/ 159576 | consumed samples: 22544 | elapsed time per iteration (ms): 13558.7 | learning rate: 6.253E-06 | global batch size: 16 | lm loss: 7.220355E+00 | loss scale: 16384.0 | grad norm: 71818.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1410/ 159576 | consumed samples: 22560 | elapsed time per iteration (ms): 13949.7 | learning rate: 6.257E-06 | global batch size: 16 | lm loss: 7.381415E+00 | loss scale: 16384.0 | grad norm: 80168.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1411/ 159576 | consumed samples: 22576 | elapsed time per iteration (ms): 13573.4 | learning rate: 6.262E-06 | global batch size: 16 | lm loss: 7.330766E+00 | loss scale: 16384.0 | grad norm: 107261.861 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1412/ 159576 | consumed samples: 22592 | elapsed time per iteration (ms): 13522.9 | learning rate: 6.266E-06 | global batch size: 16 | lm loss: 7.378265E+00 | loss scale: 16384.0 | grad norm: 115619.714 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1413/ 159576 | consumed samples: 22608 | elapsed time per iteration (ms): 13584.4 | learning rate: 6.271E-06 | global batch size: 16 | lm loss: 7.202836E+00 | loss scale: 16384.0 | grad norm: 70230.767 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1414/ 159576 | consumed samples: 22624 | elapsed time per iteration (ms): 13797.1 | learning rate: 6.275E-06 | global batch size: 16 | lm loss: 7.202533E+00 | loss scale: 16384.0 | grad norm: 122640.667 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1415/ 159576 | consumed samples: 22640 | elapsed time per iteration (ms): 13736.9 | learning rate: 6.280E-06 | global batch size: 16 | lm loss: 7.271989E+00 | loss scale: 16384.0 | grad norm: 80706.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1416/ 159576 | consumed samples: 22656 | elapsed time per iteration (ms): 13603.3 | learning rate: 6.284E-06 | global batch size: 16 | lm loss: 7.350783E+00 | loss scale: 16384.0 | grad norm: 106402.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1417/ 159576 | consumed samples: 22672 | elapsed time per iteration (ms): 13663.2 | learning rate: 6.288E-06 | global batch size: 16 | lm loss: 7.629884E+00 | loss scale: 16384.0 | grad norm: 111978.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1418/ 159576 | consumed samples: 22688 | elapsed time per iteration (ms): 13512.0 | learning rate: 6.293E-06 | global batch size: 16 | lm loss: 7.276966E+00 | loss scale: 16384.0 | grad norm: 86564.098 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1419/ 159576 | consumed samples: 22704 | elapsed time per iteration (ms): 13947.9 | learning rate: 6.297E-06 | global batch size: 16 | lm loss: 7.109100E+00 | loss scale: 16384.0 | grad norm: 85621.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1420/ 159576 | consumed samples: 22720 | elapsed time per iteration (ms): 13554.6 | learning rate: 6.302E-06 | global batch size: 16 | lm loss: 7.234724E+00 | loss scale: 16384.0 | grad norm: 115238.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1421/ 159576 | consumed samples: 22736 | elapsed time per iteration (ms): 13608.2 | learning rate: 6.306E-06 | global batch size: 16 | lm loss: 7.134557E+00 | loss scale: 16384.0 | grad norm: 127475.605 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1422/ 159576 | consumed samples: 22752 | elapsed time per iteration (ms): 13564.6 | learning rate: 6.311E-06 | global batch size: 16 | lm loss: 7.096246E+00 | loss scale: 16384.0 | grad norm: 92678.765 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1423/ 159576 | consumed samples: 22768 | elapsed time per iteration (ms): 13993.7 | learning rate: 6.315E-06 | global batch size: 16 | lm loss: 7.215540E+00 | loss scale: 16384.0 | grad norm: 77823.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1424/ 159576 | consumed samples: 22784 | elapsed time per iteration (ms): 13635.8 | learning rate: 6.320E-06 | global batch size: 16 | lm loss: 7.332169E+00 | loss scale: 16384.0 | grad norm: 88585.736 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1425/ 159576 | consumed samples: 22800 | elapsed time per iteration (ms): 13477.0 | learning rate: 6.324E-06 | global batch size: 16 | lm loss: 7.224688E+00 | loss scale: 16384.0 | grad norm: 98593.171 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1426/ 159576 | consumed samples: 22816 | elapsed time per iteration (ms): 13579.9 | learning rate: 6.328E-06 | global batch size: 16 | lm loss: 7.330650E+00 | loss scale: 16384.0 | grad norm: 101929.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1427/ 159576 | consumed samples: 22832 | elapsed time per iteration (ms): 13559.4 | learning rate: 6.333E-06 | global batch size: 16 | lm loss: 7.261027E+00 | loss scale: 16384.0 | grad norm: 79893.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1428/ 159576 | consumed samples: 22848 | elapsed time per iteration (ms): 13656.6 | learning rate: 6.337E-06 | global batch size: 16 | lm loss: 7.050019E+00 | loss scale: 16384.0 | grad norm: 197668.137 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1429/ 159576 | consumed samples: 22864 | elapsed time per iteration (ms): 13549.3 | learning rate: 6.342E-06 | global batch size: 16 | lm loss: 7.283052E+00 | loss scale: 16384.0 | grad norm: 185482.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1430/ 159576 | consumed samples: 22880 | elapsed time per iteration (ms): 13566.6 | learning rate: 6.346E-06 | global batch size: 16 | lm loss: 7.251038E+00 | loss scale: 16384.0 | grad norm: 81246.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1431/ 159576 | consumed samples: 22896 | elapsed time per iteration (ms): 13626.6 | learning rate: 6.351E-06 | global batch size: 16 | lm loss: 7.363044E+00 | loss scale: 16384.0 | grad norm: 89555.992 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1432/ 159576 | consumed samples: 22912 | elapsed time per iteration (ms): 14023.4 | learning rate: 6.355E-06 | global batch size: 16 | lm loss: 7.350190E+00 | loss scale: 16384.0 | grad norm: 151476.896 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1433/ 159576 | consumed samples: 22928 | elapsed time per iteration (ms): 13376.0 | learning rate: 6.359E-06 | global batch size: 16 | lm loss: 7.294331E+00 | loss scale: 16384.0 | grad norm: 148300.162 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1434/ 159576 | consumed samples: 22944 | elapsed time per iteration (ms): 13594.6 | learning rate: 6.364E-06 | global batch size: 16 | lm loss: 7.178850E+00 | loss scale: 16384.0 | grad norm: 115814.774 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1435/ 159576 | consumed samples: 22960 | elapsed time per iteration (ms): 13589.5 | learning rate: 6.368E-06 | global batch size: 16 | lm loss: 7.174537E+00 | loss scale: 16384.0 | grad norm: 89057.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1436/ 159576 | consumed samples: 22976 | elapsed time per iteration (ms): 13854.5 | learning rate: 6.373E-06 | global batch size: 16 | lm loss: 7.455090E+00 | loss scale: 16384.0 | grad norm: 143357.692 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1437/ 159576 | consumed samples: 22992 | elapsed time per iteration (ms): 13800.5 | learning rate: 6.377E-06 | global batch size: 16 | lm loss: 7.230480E+00 | loss scale: 16384.0 | grad norm: 124647.889 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1438/ 159576 | consumed samples: 23008 | elapsed time per iteration (ms): 13574.3 | learning rate: 6.382E-06 | global batch size: 16 | lm loss: 7.214196E+00 | loss scale: 16384.0 | grad norm: 90534.924 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1439/ 159576 | consumed samples: 23024 | elapsed time per iteration (ms): 13559.7 | learning rate: 6.386E-06 | global batch size: 16 | lm loss: 7.228687E+00 | loss scale: 16384.0 | grad norm: 100823.134 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1440/ 159576 | consumed samples: 23040 | elapsed time per iteration (ms): 13580.1 | learning rate: 6.391E-06 | global batch size: 16 | lm loss: 7.297411E+00 | loss scale: 16384.0 | grad norm: 72207.799 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1441/ 159576 | consumed samples: 23056 | elapsed time per iteration (ms): 13763.6 | learning rate: 6.395E-06 | global batch size: 16 | lm loss: 7.403437E+00 | loss scale: 16384.0 | grad norm: 227400.170 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1442/ 159576 | consumed samples: 23072 | elapsed time per iteration (ms): 13606.0 | learning rate: 6.399E-06 | global batch size: 16 | lm loss: 7.267770E+00 | loss scale: 16384.0 | grad norm: 178424.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1443/ 159576 | consumed samples: 23088 | elapsed time per iteration (ms): 13579.5 | learning rate: 6.404E-06 | global batch size: 16 | lm loss: 7.196310E+00 | loss scale: 16384.0 | grad norm: 93737.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1444/ 159576 | consumed samples: 23104 | elapsed time per iteration (ms): 13564.8 | learning rate: 6.408E-06 | global batch size: 16 | lm loss: 7.180475E+00 | loss scale: 16384.0 | grad norm: 107567.132 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1445/ 159576 | consumed samples: 23120 | elapsed time per iteration (ms): 14086.1 | learning rate: 6.413E-06 | global batch size: 16 | lm loss: 7.235699E+00 | loss scale: 16384.0 | grad norm: 90017.706 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1446/ 159576 | consumed samples: 23136 | elapsed time per iteration (ms): 13420.4 | learning rate: 6.417E-06 | global batch size: 16 | lm loss: 7.131771E+00 | loss scale: 16384.0 | grad norm: 200715.783 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1447/ 159576 | consumed samples: 23152 | elapsed time per iteration (ms): 13582.8 | learning rate: 6.422E-06 | global batch size: 16 | lm loss: 7.147336E+00 | loss scale: 16384.0 | grad norm: 139041.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1448/ 159576 | consumed samples: 23168 | elapsed time per iteration (ms): 13591.5 | learning rate: 6.426E-06 | global batch size: 16 | lm loss: 7.223548E+00 | loss scale: 16384.0 | grad norm: 81314.906 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1449/ 159576 | consumed samples: 23184 | elapsed time per iteration (ms): 13543.2 | learning rate: 6.430E-06 | global batch size: 16 | lm loss: 7.126436E+00 | loss scale: 16384.0 | grad norm: 104656.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1450/ 159576 | consumed samples: 23200 | elapsed time per iteration (ms): 13771.0 | learning rate: 6.435E-06 | global batch size: 16 | lm loss: 7.239769E+00 | loss scale: 16384.0 | grad norm: 55782.887 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1451/ 159576 | consumed samples: 23216 | elapsed time per iteration (ms): 13581.7 | learning rate: 6.439E-06 | global batch size: 16 | lm loss: 7.431156E+00 | loss scale: 16384.0 | grad norm: 265376.495 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1452/ 159576 | consumed samples: 23232 | elapsed time per iteration (ms): 13633.4 | learning rate: 6.444E-06 | global batch size: 16 | lm loss: 7.120412E+00 | loss scale: 16384.0 | grad norm: 153821.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1453/ 159576 | consumed samples: 23248 | elapsed time per iteration (ms): 13510.9 | learning rate: 6.448E-06 | global batch size: 16 | lm loss: 7.361814E+00 | loss scale: 16384.0 | grad norm: 91484.610 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1454/ 159576 | consumed samples: 23264 | elapsed time per iteration (ms): 14008.9 | learning rate: 6.453E-06 | global batch size: 16 | lm loss: 7.429213E+00 | loss scale: 16384.0 | grad norm: 95193.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1455/ 159576 | consumed samples: 23280 | elapsed time per iteration (ms): 13534.7 | learning rate: 6.457E-06 | global batch size: 16 | lm loss: 7.311771E+00 | loss scale: 16384.0 | grad norm: 99688.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1456/ 159576 | consumed samples: 23296 | elapsed time per iteration (ms): 13570.9 | learning rate: 6.462E-06 | global batch size: 16 | lm loss: 7.326795E+00 | loss scale: 16384.0 | grad norm: 199002.918 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1457/ 159576 | consumed samples: 23312 | elapsed time per iteration (ms): 13567.6 | learning rate: 6.466E-06 | global batch size: 16 | lm loss: 7.238305E+00 | loss scale: 16384.0 | grad norm: 148524.516 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1458/ 159576 | consumed samples: 23328 | elapsed time per iteration (ms): 14002.9 | learning rate: 6.470E-06 | global batch size: 16 | lm loss: 7.170752E+00 | loss scale: 16384.0 | grad norm: 83892.787 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1459/ 159576 | consumed samples: 23344 | elapsed time per iteration (ms): 13758.9 | learning rate: 6.475E-06 | global batch size: 16 | lm loss: 7.148302E+00 | loss scale: 16384.0 | grad norm: 92326.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1460/ 159576 | consumed samples: 23360 | elapsed time per iteration (ms): 13596.9 | learning rate: 6.479E-06 | global batch size: 16 | lm loss: 7.386099E+00 | loss scale: 16384.0 | grad norm: 141912.785 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1461/ 159576 | consumed samples: 23376 | elapsed time per iteration (ms): 13627.4 | learning rate: 6.484E-06 | global batch size: 16 | lm loss: 7.288848E+00 | loss scale: 16384.0 | grad norm: 170265.777 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1462/ 159576 | consumed samples: 23392 | elapsed time per iteration (ms): 13618.4 | learning rate: 6.488E-06 | global batch size: 16 | lm loss: 7.229756E+00 | loss scale: 16384.0 | grad norm: 120999.804 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1463/ 159576 | consumed samples: 23408 | elapsed time per iteration (ms): 13656.7 | learning rate: 6.493E-06 | global batch size: 16 | lm loss: 7.281564E+00 | loss scale: 16384.0 | grad norm: 93039.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1464/ 159576 | consumed samples: 23424 | elapsed time per iteration (ms): 13645.1 | learning rate: 6.497E-06 | global batch size: 16 | lm loss: 7.287534E+00 | loss scale: 16384.0 | grad norm: 80620.713 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1465/ 159576 | consumed samples: 23440 | elapsed time per iteration (ms): 13567.3 | learning rate: 6.501E-06 | global batch size: 16 | lm loss: 7.328496E+00 | loss scale: 16384.0 | grad norm: 125622.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1466/ 159576 | consumed samples: 23456 | elapsed time per iteration (ms): 13597.3 | learning rate: 6.506E-06 | global batch size: 16 | lm loss: 7.289563E+00 | loss scale: 16384.0 | grad norm: 115928.663 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1467/ 159576 | consumed samples: 23472 | elapsed time per iteration (ms): 13941.8 | learning rate: 6.510E-06 | global batch size: 16 | lm loss: 7.383677E+00 | loss scale: 16384.0 | grad norm: 88787.769 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1468/ 159576 | consumed samples: 23488 | elapsed time per iteration (ms): 13557.9 | learning rate: 6.515E-06 | global batch size: 16 | lm loss: 7.200576E+00 | loss scale: 16384.0 | grad norm: 72136.963 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1469/ 159576 | consumed samples: 23504 | elapsed time per iteration (ms): 13659.8 | learning rate: 6.519E-06 | global batch size: 16 | lm loss: 7.237146E+00 | loss scale: 16384.0 | grad norm: 80384.892 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1470/ 159576 | consumed samples: 23520 | elapsed time per iteration (ms): 13520.5 | learning rate: 6.524E-06 | global batch size: 16 | lm loss: 7.087498E+00 | loss scale: 16384.0 | grad norm: 84910.064 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1471/ 159576 | consumed samples: 23536 | elapsed time per iteration (ms): 13587.4 | learning rate: 6.528E-06 | global batch size: 16 | lm loss: 7.201303E+00 | loss scale: 16384.0 | grad norm: 82344.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1472/ 159576 | consumed samples: 23552 | elapsed time per iteration (ms): 13785.3 | learning rate: 6.533E-06 | global batch size: 16 | lm loss: 7.099293E+00 | loss scale: 16384.0 | grad norm: 90694.938 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1473/ 159576 | consumed samples: 23568 | elapsed time per iteration (ms): 13564.5 | learning rate: 6.537E-06 | global batch size: 16 | lm loss: 7.241871E+00 | loss scale: 16384.0 | grad norm: 49829.478 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1474/ 159576 | consumed samples: 23584 | elapsed time per iteration (ms): 13624.0 | learning rate: 6.541E-06 | global batch size: 16 | lm loss: 7.157920E+00 | loss scale: 16384.0 | grad norm: 134064.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1475/ 159576 | consumed samples: 23600 | elapsed time per iteration (ms): 13651.2 | learning rate: 6.546E-06 | global batch size: 16 | lm loss: 7.214240E+00 | loss scale: 16384.0 | grad norm: 86872.151 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1476/ 159576 | consumed samples: 23616 | elapsed time per iteration (ms): 14166.8 | learning rate: 6.550E-06 | global batch size: 16 | lm loss: 7.192460E+00 | loss scale: 16384.0 | grad norm: 80848.938 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1477/ 159576 | consumed samples: 23632 | elapsed time per iteration (ms): 13604.7 | learning rate: 6.555E-06 | global batch size: 16 | lm loss: 7.323776E+00 | loss scale: 16384.0 | grad norm: 70454.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1478/ 159576 | consumed samples: 23648 | elapsed time per iteration (ms): 13572.6 | learning rate: 6.559E-06 | global batch size: 16 | lm loss: 7.268590E+00 | loss scale: 16384.0 | grad norm: 71693.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1479/ 159576 | consumed samples: 23664 | elapsed time per iteration (ms): 13608.6 | learning rate: 6.564E-06 | global batch size: 16 | lm loss: 7.296487E+00 | loss scale: 16384.0 | grad norm: 81654.087 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1480/ 159576 | consumed samples: 23680 | elapsed time per iteration (ms): 14039.7 | learning rate: 6.568E-06 | global batch size: 16 | lm loss: 7.090362E+00 | loss scale: 16384.0 | grad norm: 64201.153 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1481/ 159576 | consumed samples: 23696 | elapsed time per iteration (ms): 13583.2 | learning rate: 6.572E-06 | global batch size: 16 | lm loss: 7.375229E+00 | loss scale: 16384.0 | grad norm: 113007.126 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1482/ 159576 | consumed samples: 23712 | elapsed time per iteration (ms): 13660.9 | learning rate: 6.577E-06 | global batch size: 16 | lm loss: 7.293176E+00 | loss scale: 16384.0 | grad norm: 77498.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1483/ 159576 | consumed samples: 23728 | elapsed time per iteration (ms): 13614.0 | learning rate: 6.581E-06 | global batch size: 16 | lm loss: 7.336072E+00 | loss scale: 16384.0 | grad norm: 110912.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1484/ 159576 | consumed samples: 23744 | elapsed time per iteration (ms): 13566.7 | learning rate: 6.586E-06 | global batch size: 16 | lm loss: 7.364174E+00 | loss scale: 16384.0 | grad norm: 183688.896 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1485/ 159576 | consumed samples: 23760 | elapsed time per iteration (ms): 13815.4 | learning rate: 6.590E-06 | global batch size: 16 | lm loss: 7.239150E+00 | loss scale: 16384.0 | grad norm: 72249.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1486/ 159576 | consumed samples: 23776 | elapsed time per iteration (ms): 13589.6 | learning rate: 6.595E-06 | global batch size: 16 | lm loss: 7.200100E+00 | loss scale: 16384.0 | grad norm: 96228.791 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1487/ 159576 | consumed samples: 23792 | elapsed time per iteration (ms): 13607.7 | learning rate: 6.599E-06 | global batch size: 16 | lm loss: 7.292061E+00 | loss scale: 16384.0 | grad norm: 121424.509 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1488/ 159576 | consumed samples: 23808 | elapsed time per iteration (ms): 13632.1 | learning rate: 6.604E-06 | global batch size: 16 | lm loss: 7.136326E+00 | loss scale: 16384.0 | grad norm: 126581.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1489/ 159576 | consumed samples: 23824 | elapsed time per iteration (ms): 14024.4 | learning rate: 6.608E-06 | global batch size: 16 | lm loss: 7.314082E+00 | loss scale: 16384.0 | grad norm: 81672.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1490/ 159576 | consumed samples: 23840 | elapsed time per iteration (ms): 13562.3 | learning rate: 6.612E-06 | global batch size: 16 | lm loss: 7.220848E+00 | loss scale: 16384.0 | grad norm: 124864.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1491/ 159576 | consumed samples: 23856 | elapsed time per iteration (ms): 13573.1 | learning rate: 6.617E-06 | global batch size: 16 | lm loss: 7.139018E+00 | loss scale: 16384.0 | grad norm: 91430.675 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1492/ 159576 | consumed samples: 23872 | elapsed time per iteration (ms): 13614.3 | learning rate: 6.621E-06 | global batch size: 16 | lm loss: 7.268013E+00 | loss scale: 16384.0 | grad norm: 135716.036 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1493/ 159576 | consumed samples: 23888 | elapsed time per iteration (ms): 13616.6 | learning rate: 6.626E-06 | global batch size: 16 | lm loss: 7.252588E+00 | loss scale: 16384.0 | grad norm: 83740.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1494/ 159576 | consumed samples: 23904 | elapsed time per iteration (ms): 13959.7 | learning rate: 6.630E-06 | global batch size: 16 | lm loss: 6.975100E+00 | loss scale: 16384.0 | grad norm: 83284.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1495/ 159576 | consumed samples: 23920 | elapsed time per iteration (ms): 13605.9 | learning rate: 6.635E-06 | global batch size: 16 | lm loss: 7.372656E+00 | loss scale: 16384.0 | grad norm: 69225.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1496/ 159576 | consumed samples: 23936 | elapsed time per iteration (ms): 13623.3 | learning rate: 6.639E-06 | global batch size: 16 | lm loss: 7.219198E+00 | loss scale: 16384.0 | grad norm: 115429.784 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1497/ 159576 | consumed samples: 23952 | elapsed time per iteration (ms): 13627.9 | learning rate: 6.643E-06 | global batch size: 16 | lm loss: 7.340521E+00 | loss scale: 16384.0 | grad norm: 85290.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1498/ 159576 | consumed samples: 23968 | elapsed time per iteration (ms): 13884.3 | learning rate: 6.648E-06 | global batch size: 16 | lm loss: 7.186238E+00 | loss scale: 16384.0 | grad norm: 114903.673 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1499/ 159576 | consumed samples: 23984 | elapsed time per iteration (ms): 13657.7 | learning rate: 6.652E-06 | global batch size: 16 | lm loss: 7.208917E+00 | loss scale: 16384.0 | grad norm: 92023.782 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1500/ 159576 | consumed samples: 24000 | elapsed time per iteration (ms): 13609.5 | learning rate: 6.657E-06 | global batch size: 16 | lm loss: 7.191697E+00 | loss scale: 32768.0 | grad norm: 96438.628 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 1500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-24 08:02:02,260] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step1500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 1500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 17270.67 iteration 1501/ 159576 | consumed samples: 24016 | elapsed time per iteration (ms): 31402.6 | learning rate: 6.661E-06 | global batch size: 16 | lm loss: 7.276592E+00 | loss scale: 32768.0 | grad norm: 161966.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1502/ 159576 | consumed samples: 24032 | elapsed time per iteration (ms): 13594.3 | learning rate: 6.666E-06 | global batch size: 16 | lm loss: 7.318794E+00 | loss scale: 32768.0 | grad norm: 194567.067 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1503/ 159576 | consumed samples: 24048 | elapsed time per iteration (ms): 13587.2 | learning rate: 6.670E-06 | global batch size: 16 | lm loss: 7.168730E+00 | loss scale: 32768.0 | grad norm: 147316.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1504/ 159576 | consumed samples: 24064 | elapsed time per iteration (ms): 13690.3 | learning rate: 6.675E-06 | global batch size: 16 | lm loss: 7.199265E+00 | loss scale: 32768.0 | grad norm: 160502.917 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1505/ 159576 | consumed samples: 24080 | elapsed time per iteration (ms): 14065.5 | learning rate: 6.679E-06 | global batch size: 16 | lm loss: 7.004994E+00 | loss scale: 32768.0 | grad norm: 126147.516 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1506/ 159576 | consumed samples: 24096 | elapsed time per iteration (ms): 13542.1 | learning rate: 6.683E-06 | global batch size: 16 | lm loss: 7.322471E+00 | loss scale: 32768.0 | grad norm: 196683.898 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1507/ 159576 | consumed samples: 24112 | elapsed time per iteration (ms): 13669.0 | learning rate: 6.688E-06 | global batch size: 16 | lm loss: 7.393982E+00 | loss scale: 32768.0 | grad norm: 190898.758 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 08:03:56] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1162855_[2-10%1] on 'gpu_p13' partition) [2021-09-24 08:03:56] PULSE: tr8-104B is running for 2:11:45 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 1508/ 159576 | consumed samples: 24128 | elapsed time per iteration (ms): 13530.1 | learning rate: 6.692E-06 | global batch size: 16 | lm loss: 7.303823E+00 | loss scale: 32768.0 | grad norm: 138876.766 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1509/ 159576 | consumed samples: 24144 | elapsed time per iteration (ms): 13620.2 | learning rate: 6.697E-06 | global batch size: 16 | lm loss: 7.181733E+00 | loss scale: 32768.0 | grad norm: 245330.128 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1510/ 159576 | consumed samples: 24160 | elapsed time per iteration (ms): 13857.7 | learning rate: 6.701E-06 | global batch size: 16 | lm loss: 7.249762E+00 | loss scale: 32768.0 | grad norm: 178346.781 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1511/ 159576 | consumed samples: 24176 | elapsed time per iteration (ms): 13642.0 | learning rate: 6.706E-06 | global batch size: 16 | lm loss: 7.141682E+00 | loss scale: 32768.0 | grad norm: 225502.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1512/ 159576 | consumed samples: 24192 | elapsed time per iteration (ms): 13680.2 | learning rate: 6.710E-06 | global batch size: 16 | lm loss: 7.262461E+00 | loss scale: 32768.0 | grad norm: 152013.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1513/ 159576 | consumed samples: 24208 | elapsed time per iteration (ms): 6867.5 | learning rate: 6.710E-06 | global batch size: 16 | lm loss: 7.117817E+00 | loss scale: 32768.0 | grad norm: 152013.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1514/ 159576 | consumed samples: 24224 | elapsed time per iteration (ms): 13192.9 | learning rate: 6.714E-06 | global batch size: 16 | lm loss: 7.508438E+00 | loss scale: 32768.0 | grad norm: 277772.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1515/ 159576 | consumed samples: 24240 | elapsed time per iteration (ms): 13697.2 | learning rate: 6.719E-06 | global batch size: 16 | lm loss: 7.055306E+00 | loss scale: 32768.0 | grad norm: 184291.975 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1516/ 159576 | consumed samples: 24256 | elapsed time per iteration (ms): 13601.8 | learning rate: 6.723E-06 | global batch size: 16 | lm loss: 7.364224E+00 | loss scale: 32768.0 | grad norm: 153076.917 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1517/ 159576 | consumed samples: 24272 | elapsed time per iteration (ms): 13603.6 | learning rate: 6.728E-06 | global batch size: 16 | lm loss: 6.912699E+00 | loss scale: 32768.0 | grad norm: 218098.104 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1518/ 159576 | consumed samples: 24288 | elapsed time per iteration (ms): 13640.7 | learning rate: 6.732E-06 | global batch size: 16 | lm loss: 7.323909E+00 | loss scale: 32768.0 | grad norm: 216972.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1519/ 159576 | consumed samples: 24304 | elapsed time per iteration (ms): 14045.8 | learning rate: 6.737E-06 | global batch size: 16 | lm loss: 7.068207E+00 | loss scale: 32768.0 | grad norm: 118810.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1520/ 159576 | consumed samples: 24320 | elapsed time per iteration (ms): 13595.0 | learning rate: 6.741E-06 | global batch size: 16 | lm loss: 7.160398E+00 | loss scale: 32768.0 | grad norm: 174748.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1521/ 159576 | consumed samples: 24336 | elapsed time per iteration (ms): 13611.5 | learning rate: 6.746E-06 | global batch size: 16 | lm loss: 7.170628E+00 | loss scale: 32768.0 | grad norm: 146800.781 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1522/ 159576 | consumed samples: 24352 | elapsed time per iteration (ms): 13576.3 | learning rate: 6.750E-06 | global batch size: 16 | lm loss: 7.141685E+00 | loss scale: 32768.0 | grad norm: 301970.136 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1523/ 159576 | consumed samples: 24368 | elapsed time per iteration (ms): 13818.0 | learning rate: 6.754E-06 | global batch size: 16 | lm loss: 7.351134E+00 | loss scale: 32768.0 | grad norm: 203560.816 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1524/ 159576 | consumed samples: 24384 | elapsed time per iteration (ms): 13700.8 | learning rate: 6.759E-06 | global batch size: 16 | lm loss: 7.291396E+00 | loss scale: 32768.0 | grad norm: 186296.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1525/ 159576 | consumed samples: 24400 | elapsed time per iteration (ms): 13611.8 | learning rate: 6.763E-06 | global batch size: 16 | lm loss: 7.052688E+00 | loss scale: 32768.0 | grad norm: 186235.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1526/ 159576 | consumed samples: 24416 | elapsed time per iteration (ms): 13626.5 | learning rate: 6.768E-06 | global batch size: 16 | lm loss: 7.083735E+00 | loss scale: 32768.0 | grad norm: 254298.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1527/ 159576 | consumed samples: 24432 | elapsed time per iteration (ms): 13677.9 | learning rate: 6.772E-06 | global batch size: 16 | lm loss: 7.212967E+00 | loss scale: 32768.0 | grad norm: 290009.050 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1528/ 159576 | consumed samples: 24448 | elapsed time per iteration (ms): 13998.5 | learning rate: 6.777E-06 | global batch size: 16 | lm loss: 7.249606E+00 | loss scale: 32768.0 | grad norm: 193082.466 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1529/ 159576 | consumed samples: 24464 | elapsed time per iteration (ms): 13543.2 | learning rate: 6.781E-06 | global batch size: 16 | lm loss: 7.187498E+00 | loss scale: 32768.0 | grad norm: 161368.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1530/ 159576 | consumed samples: 24480 | elapsed time per iteration (ms): 13565.1 | learning rate: 6.786E-06 | global batch size: 16 | lm loss: 7.266234E+00 | loss scale: 32768.0 | grad norm: 198639.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1531/ 159576 | consumed samples: 24496 | elapsed time per iteration (ms): 13571.4 | learning rate: 6.790E-06 | global batch size: 16 | lm loss: 7.528541E+00 | loss scale: 32768.0 | grad norm: 545404.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1532/ 159576 | consumed samples: 24512 | elapsed time per iteration (ms): 13970.0 | learning rate: 6.794E-06 | global batch size: 16 | lm loss: 7.212701E+00 | loss scale: 32768.0 | grad norm: 227881.927 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1533/ 159576 | consumed samples: 24528 | elapsed time per iteration (ms): 13566.3 | learning rate: 6.799E-06 | global batch size: 16 | lm loss: 7.440462E+00 | loss scale: 32768.0 | grad norm: 170454.067 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1534/ 159576 | consumed samples: 24544 | elapsed time per iteration (ms): 13611.2 | learning rate: 6.803E-06 | global batch size: 16 | lm loss: 7.264073E+00 | loss scale: 32768.0 | grad norm: 306199.566 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1535/ 159576 | consumed samples: 24560 | elapsed time per iteration (ms): 13661.5 | learning rate: 6.808E-06 | global batch size: 16 | lm loss: 7.109380E+00 | loss scale: 32768.0 | grad norm: 130108.699 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1536/ 159576 | consumed samples: 24576 | elapsed time per iteration (ms): 13539.1 | learning rate: 6.812E-06 | global batch size: 16 | lm loss: 7.475006E+00 | loss scale: 32768.0 | grad norm: 447958.462 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1537/ 159576 | consumed samples: 24592 | elapsed time per iteration (ms): 13698.1 | learning rate: 6.817E-06 | global batch size: 16 | lm loss: 7.372583E+00 | loss scale: 32768.0 | grad norm: 233240.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1538/ 159576 | consumed samples: 24608 | elapsed time per iteration (ms): 13601.5 | learning rate: 6.821E-06 | global batch size: 16 | lm loss: 7.208574E+00 | loss scale: 32768.0 | grad norm: 208866.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1539/ 159576 | consumed samples: 24624 | elapsed time per iteration (ms): 13645.6 | learning rate: 6.825E-06 | global batch size: 16 | lm loss: 7.209548E+00 | loss scale: 32768.0 | grad norm: 290418.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1540/ 159576 | consumed samples: 24640 | elapsed time per iteration (ms): 13628.1 | learning rate: 6.830E-06 | global batch size: 16 | lm loss: 7.168006E+00 | loss scale: 32768.0 | grad norm: 271187.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1541/ 159576 | consumed samples: 24656 | elapsed time per iteration (ms): 14103.2 | learning rate: 6.834E-06 | global batch size: 16 | lm loss: 7.235812E+00 | loss scale: 32768.0 | grad norm: 368637.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1542/ 159576 | consumed samples: 24672 | elapsed time per iteration (ms): 13752.7 | learning rate: 6.839E-06 | global batch size: 16 | lm loss: 7.205466E+00 | loss scale: 32768.0 | grad norm: 275606.149 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1543/ 159576 | consumed samples: 24688 | elapsed time per iteration (ms): 13526.0 | learning rate: 6.843E-06 | global batch size: 16 | lm loss: 7.152663E+00 | loss scale: 32768.0 | grad norm: 186385.977 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1544/ 159576 | consumed samples: 24704 | elapsed time per iteration (ms): 13591.1 | learning rate: 6.848E-06 | global batch size: 16 | lm loss: 7.402153E+00 | loss scale: 32768.0 | grad norm: 202784.884 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1545/ 159576 | consumed samples: 24720 | elapsed time per iteration (ms): 13853.8 | learning rate: 6.852E-06 | global batch size: 16 | lm loss: 7.254861E+00 | loss scale: 32768.0 | grad norm: 302847.689 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1546/ 159576 | consumed samples: 24736 | elapsed time per iteration (ms): 13718.3 | learning rate: 6.857E-06 | global batch size: 16 | lm loss: 7.259928E+00 | loss scale: 32768.0 | grad norm: 190927.131 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1547/ 159576 | consumed samples: 24752 | elapsed time per iteration (ms): 13565.0 | learning rate: 6.861E-06 | global batch size: 16 | lm loss: 7.226044E+00 | loss scale: 32768.0 | grad norm: 147732.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1548/ 159576 | consumed samples: 24768 | elapsed time per iteration (ms): 13562.3 | learning rate: 6.865E-06 | global batch size: 16 | lm loss: 7.106945E+00 | loss scale: 32768.0 | grad norm: 275364.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1549/ 159576 | consumed samples: 24784 | elapsed time per iteration (ms): 13573.3 | learning rate: 6.870E-06 | global batch size: 16 | lm loss: 7.157021E+00 | loss scale: 32768.0 | grad norm: 180244.172 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1550/ 159576 | consumed samples: 24800 | elapsed time per iteration (ms): 13916.8 | learning rate: 6.874E-06 | global batch size: 16 | lm loss: 7.001479E+00 | loss scale: 32768.0 | grad norm: 268566.065 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1551/ 159576 | consumed samples: 24816 | elapsed time per iteration (ms): 13651.8 | learning rate: 6.879E-06 | global batch size: 16 | lm loss: 7.167608E+00 | loss scale: 32768.0 | grad norm: 198735.053 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1552/ 159576 | consumed samples: 24832 | elapsed time per iteration (ms): 13608.0 | learning rate: 6.883E-06 | global batch size: 16 | lm loss: 7.093953E+00 | loss scale: 32768.0 | grad norm: 170933.719 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1553/ 159576 | consumed samples: 24848 | elapsed time per iteration (ms): 13517.6 | learning rate: 6.888E-06 | global batch size: 16 | lm loss: 7.234317E+00 | loss scale: 32768.0 | grad norm: 237231.760 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1554/ 159576 | consumed samples: 24864 | elapsed time per iteration (ms): 14011.1 | learning rate: 6.892E-06 | global batch size: 16 | lm loss: 7.130560E+00 | loss scale: 32768.0 | grad norm: 237902.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1555/ 159576 | consumed samples: 24880 | elapsed time per iteration (ms): 13510.9 | learning rate: 6.896E-06 | global batch size: 16 | lm loss: 7.275712E+00 | loss scale: 32768.0 | grad norm: 149656.891 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1556/ 159576 | consumed samples: 24896 | elapsed time per iteration (ms): 13617.0 | learning rate: 6.901E-06 | global batch size: 16 | lm loss: 7.239087E+00 | loss scale: 32768.0 | grad norm: 186987.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1557/ 159576 | consumed samples: 24912 | elapsed time per iteration (ms): 13622.7 | learning rate: 6.905E-06 | global batch size: 16 | lm loss: 6.972548E+00 | loss scale: 32768.0 | grad norm: 167404.940 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1558/ 159576 | consumed samples: 24928 | elapsed time per iteration (ms): 13629.7 | learning rate: 6.910E-06 | global batch size: 16 | lm loss: 7.274665E+00 | loss scale: 32768.0 | grad norm: 170409.995 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1559/ 159576 | consumed samples: 24944 | elapsed time per iteration (ms): 13856.8 | learning rate: 6.914E-06 | global batch size: 16 | lm loss: 7.320499E+00 | loss scale: 32768.0 | grad norm: 139509.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1560/ 159576 | consumed samples: 24960 | elapsed time per iteration (ms): 13572.0 | learning rate: 6.919E-06 | global batch size: 16 | lm loss: 7.481147E+00 | loss scale: 32768.0 | grad norm: 204961.182 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1561/ 159576 | consumed samples: 24976 | elapsed time per iteration (ms): 13609.9 | learning rate: 6.923E-06 | global batch size: 16 | lm loss: 7.318799E+00 | loss scale: 32768.0 | grad norm: 233741.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1562/ 159576 | consumed samples: 24992 | elapsed time per iteration (ms): 13593.5 | learning rate: 6.928E-06 | global batch size: 16 | lm loss: 6.970228E+00 | loss scale: 32768.0 | grad norm: 159417.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1563/ 159576 | consumed samples: 25008 | elapsed time per iteration (ms): 13894.7 | learning rate: 6.932E-06 | global batch size: 16 | lm loss: 7.266310E+00 | loss scale: 32768.0 | grad norm: 154081.846 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1564/ 159576 | consumed samples: 25024 | elapsed time per iteration (ms): 13687.0 | learning rate: 6.936E-06 | global batch size: 16 | lm loss: 7.274476E+00 | loss scale: 32768.0 | grad norm: 258666.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1565/ 159576 | consumed samples: 25040 | elapsed time per iteration (ms): 13663.3 | learning rate: 6.941E-06 | global batch size: 16 | lm loss: 7.125623E+00 | loss scale: 32768.0 | grad norm: 167968.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1566/ 159576 | consumed samples: 25056 | elapsed time per iteration (ms): 13604.1 | learning rate: 6.945E-06 | global batch size: 16 | lm loss: 7.210727E+00 | loss scale: 32768.0 | grad norm: 198543.646 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1567/ 159576 | consumed samples: 25072 | elapsed time per iteration (ms): 14015.2 | learning rate: 6.950E-06 | global batch size: 16 | lm loss: 7.245472E+00 | loss scale: 32768.0 | grad norm: 149711.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1568/ 159576 | consumed samples: 25088 | elapsed time per iteration (ms): 13524.3 | learning rate: 6.954E-06 | global batch size: 16 | lm loss: 6.959779E+00 | loss scale: 32768.0 | grad norm: 217321.763 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1569/ 159576 | consumed samples: 25104 | elapsed time per iteration (ms): 13601.8 | learning rate: 6.959E-06 | global batch size: 16 | lm loss: 7.177199E+00 | loss scale: 32768.0 | grad norm: 254297.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1570/ 159576 | consumed samples: 25120 | elapsed time per iteration (ms): 13589.9 | learning rate: 6.963E-06 | global batch size: 16 | lm loss: 7.113214E+00 | loss scale: 32768.0 | grad norm: 172729.515 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1571/ 159576 | consumed samples: 25136 | elapsed time per iteration (ms): 13658.1 | learning rate: 6.967E-06 | global batch size: 16 | lm loss: 7.054616E+00 | loss scale: 32768.0 | grad norm: 176859.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1572/ 159576 | consumed samples: 25152 | elapsed time per iteration (ms): 13798.6 | learning rate: 6.972E-06 | global batch size: 16 | lm loss: 7.111713E+00 | loss scale: 32768.0 | grad norm: 165282.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1573/ 159576 | consumed samples: 25168 | elapsed time per iteration (ms): 13684.6 | learning rate: 6.976E-06 | global batch size: 16 | lm loss: 7.324330E+00 | loss scale: 32768.0 | grad norm: 205395.896 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1574/ 159576 | consumed samples: 25184 | elapsed time per iteration (ms): 13612.3 | learning rate: 6.981E-06 | global batch size: 16 | lm loss: 7.139562E+00 | loss scale: 32768.0 | grad norm: 201180.686 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1575/ 159576 | consumed samples: 25200 | elapsed time per iteration (ms): 13567.2 | learning rate: 6.985E-06 | global batch size: 16 | lm loss: 7.063004E+00 | loss scale: 32768.0 | grad norm: 126181.509 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1576/ 159576 | consumed samples: 25216 | elapsed time per iteration (ms): 13982.4 | learning rate: 6.990E-06 | global batch size: 16 | lm loss: 7.030066E+00 | loss scale: 32768.0 | grad norm: 261758.694 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1577/ 159576 | consumed samples: 25232 | elapsed time per iteration (ms): 13552.2 | learning rate: 6.994E-06 | global batch size: 16 | lm loss: 7.129750E+00 | loss scale: 32768.0 | grad norm: 133747.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1578/ 159576 | consumed samples: 25248 | elapsed time per iteration (ms): 13576.0 | learning rate: 6.999E-06 | global batch size: 16 | lm loss: 7.478085E+00 | loss scale: 32768.0 | grad norm: 193421.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1579/ 159576 | consumed samples: 25264 | elapsed time per iteration (ms): 13627.7 | learning rate: 7.003E-06 | global batch size: 16 | lm loss: 7.062607E+00 | loss scale: 32768.0 | grad norm: 162309.186 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1580/ 159576 | consumed samples: 25280 | elapsed time per iteration (ms): 13870.0 | learning rate: 7.007E-06 | global batch size: 16 | lm loss: 6.734056E+00 | loss scale: 32768.0 | grad norm: 233732.101 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1581/ 159576 | consumed samples: 25296 | elapsed time per iteration (ms): 13680.5 | learning rate: 7.012E-06 | global batch size: 16 | lm loss: 7.360079E+00 | loss scale: 32768.0 | grad norm: 189405.056 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1582/ 159576 | consumed samples: 25312 | elapsed time per iteration (ms): 13679.9 | learning rate: 7.016E-06 | global batch size: 16 | lm loss: 7.291443E+00 | loss scale: 32768.0 | grad norm: 159639.849 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1583/ 159576 | consumed samples: 25328 | elapsed time per iteration (ms): 13579.9 | learning rate: 7.021E-06 | global batch size: 16 | lm loss: 7.361541E+00 | loss scale: 32768.0 | grad norm: 178947.980 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1584/ 159576 | consumed samples: 25344 | elapsed time per iteration (ms): 13614.6 | learning rate: 7.025E-06 | global batch size: 16 | lm loss: 7.145397E+00 | loss scale: 32768.0 | grad norm: 198293.827 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1585/ 159576 | consumed samples: 25360 | elapsed time per iteration (ms): 13943.5 | learning rate: 7.030E-06 | global batch size: 16 | lm loss: 7.009763E+00 | loss scale: 32768.0 | grad norm: 172995.962 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1586/ 159576 | consumed samples: 25376 | elapsed time per iteration (ms): 13665.6 | learning rate: 7.034E-06 | global batch size: 16 | lm loss: 7.306109E+00 | loss scale: 32768.0 | grad norm: 193555.142 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1587/ 159576 | consumed samples: 25392 | elapsed time per iteration (ms): 13713.0 | learning rate: 7.038E-06 | global batch size: 16 | lm loss: 7.341703E+00 | loss scale: 32768.0 | grad norm: 240981.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1588/ 159576 | consumed samples: 25408 | elapsed time per iteration (ms): 13685.0 | learning rate: 7.043E-06 | global batch size: 16 | lm loss: 7.076401E+00 | loss scale: 32768.0 | grad norm: 144170.844 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1589/ 159576 | consumed samples: 25424 | elapsed time per iteration (ms): 13990.2 | learning rate: 7.047E-06 | global batch size: 16 | lm loss: 7.016201E+00 | loss scale: 32768.0 | grad norm: 215101.083 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1590/ 159576 | consumed samples: 25440 | elapsed time per iteration (ms): 13615.2 | learning rate: 7.052E-06 | global batch size: 16 | lm loss: 7.248097E+00 | loss scale: 32768.0 | grad norm: 183674.866 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1591/ 159576 | consumed samples: 25456 | elapsed time per iteration (ms): 13603.7 | learning rate: 7.056E-06 | global batch size: 16 | lm loss: 7.274388E+00 | loss scale: 32768.0 | grad norm: 194912.772 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1592/ 159576 | consumed samples: 25472 | elapsed time per iteration (ms): 13589.1 | learning rate: 7.061E-06 | global batch size: 16 | lm loss: 7.189001E+00 | loss scale: 32768.0 | grad norm: 178991.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1593/ 159576 | consumed samples: 25488 | elapsed time per iteration (ms): 13610.8 | learning rate: 7.065E-06 | global batch size: 16 | lm loss: 7.232603E+00 | loss scale: 32768.0 | grad norm: 152962.889 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1594/ 159576 | consumed samples: 25504 | elapsed time per iteration (ms): 13768.0 | learning rate: 7.070E-06 | global batch size: 16 | lm loss: 7.102697E+00 | loss scale: 32768.0 | grad norm: 144835.907 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1595/ 159576 | consumed samples: 25520 | elapsed time per iteration (ms): 13616.0 | learning rate: 7.074E-06 | global batch size: 16 | lm loss: 7.124231E+00 | loss scale: 32768.0 | grad norm: 492597.129 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1596/ 159576 | consumed samples: 25536 | elapsed time per iteration (ms): 13671.0 | learning rate: 7.078E-06 | global batch size: 16 | lm loss: 7.347673E+00 | loss scale: 32768.0 | grad norm: 283986.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1597/ 159576 | consumed samples: 25552 | elapsed time per iteration (ms): 13618.5 | learning rate: 7.083E-06 | global batch size: 16 | lm loss: 7.247316E+00 | loss scale: 32768.0 | grad norm: 185319.173 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1598/ 159576 | consumed samples: 25568 | elapsed time per iteration (ms): 14074.4 | learning rate: 7.087E-06 | global batch size: 16 | lm loss: 7.152137E+00 | loss scale: 32768.0 | grad norm: 179820.746 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1599/ 159576 | consumed samples: 25584 | elapsed time per iteration (ms): 13609.5 | learning rate: 7.092E-06 | global batch size: 16 | lm loss: 7.087896E+00 | loss scale: 32768.0 | grad norm: 178653.073 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1600/ 159576 | consumed samples: 25600 | elapsed time per iteration (ms): 13606.5 | learning rate: 7.096E-06 | global batch size: 16 | lm loss: 7.094335E+00 | loss scale: 32768.0 | grad norm: 197442.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1601/ 159576 | consumed samples: 25616 | elapsed time per iteration (ms): 13605.3 | learning rate: 7.101E-06 | global batch size: 16 | lm loss: 7.230387E+00 | loss scale: 32768.0 | grad norm: 277453.177 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1602/ 159576 | consumed samples: 25632 | elapsed time per iteration (ms): 14026.8 | learning rate: 7.105E-06 | global batch size: 16 | lm loss: 7.399794E+00 | loss scale: 32768.0 | grad norm: 202190.175 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1603/ 159576 | consumed samples: 25648 | elapsed time per iteration (ms): 13782.5 | learning rate: 7.109E-06 | global batch size: 16 | lm loss: 7.261839E+00 | loss scale: 32768.0 | grad norm: 162395.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1604/ 159576 | consumed samples: 25664 | elapsed time per iteration (ms): 13652.4 | learning rate: 7.114E-06 | global batch size: 16 | lm loss: 7.202652E+00 | loss scale: 32768.0 | grad norm: 199798.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1605/ 159576 | consumed samples: 25680 | elapsed time per iteration (ms): 13537.9 | learning rate: 7.118E-06 | global batch size: 16 | lm loss: 7.002069E+00 | loss scale: 32768.0 | grad norm: 200932.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1606/ 159576 | consumed samples: 25696 | elapsed time per iteration (ms): 13623.9 | learning rate: 7.123E-06 | global batch size: 16 | lm loss: 6.994870E+00 | loss scale: 32768.0 | grad norm: 182105.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1607/ 159576 | consumed samples: 25712 | elapsed time per iteration (ms): 13778.9 | learning rate: 7.127E-06 | global batch size: 16 | lm loss: 7.236290E+00 | loss scale: 32768.0 | grad norm: 210525.575 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1608/ 159576 | consumed samples: 25728 | elapsed time per iteration (ms): 13614.0 | learning rate: 7.132E-06 | global batch size: 16 | lm loss: 7.271640E+00 | loss scale: 32768.0 | grad norm: 155104.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1609/ 159576 | consumed samples: 25744 | elapsed time per iteration (ms): 13637.4 | learning rate: 7.136E-06 | global batch size: 16 | lm loss: 7.142178E+00 | loss scale: 32768.0 | grad norm: 179013.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1610/ 159576 | consumed samples: 25760 | elapsed time per iteration (ms): 13663.2 | learning rate: 7.141E-06 | global batch size: 16 | lm loss: 7.233703E+00 | loss scale: 32768.0 | grad norm: 205415.974 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1611/ 159576 | consumed samples: 25776 | elapsed time per iteration (ms): 14078.6 | learning rate: 7.145E-06 | global batch size: 16 | lm loss: 7.137359E+00 | loss scale: 32768.0 | grad norm: 211115.165 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1612/ 159576 | consumed samples: 25792 | elapsed time per iteration (ms): 13476.7 | learning rate: 7.149E-06 | global batch size: 16 | lm loss: 7.265315E+00 | loss scale: 32768.0 | grad norm: 221323.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1613/ 159576 | consumed samples: 25808 | elapsed time per iteration (ms): 13601.4 | learning rate: 7.154E-06 | global batch size: 16 | lm loss: 7.092045E+00 | loss scale: 32768.0 | grad norm: 157009.908 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1614/ 159576 | consumed samples: 25824 | elapsed time per iteration (ms): 13616.6 | learning rate: 7.158E-06 | global batch size: 16 | lm loss: 7.018819E+00 | loss scale: 32768.0 | grad norm: 198533.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1615/ 159576 | consumed samples: 25840 | elapsed time per iteration (ms): 13623.7 | learning rate: 7.163E-06 | global batch size: 16 | lm loss: 7.280205E+00 | loss scale: 32768.0 | grad norm: 288417.013 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1616/ 159576 | consumed samples: 25856 | elapsed time per iteration (ms): 13877.9 | learning rate: 7.167E-06 | global batch size: 16 | lm loss: 7.224732E+00 | loss scale: 32768.0 | grad norm: 186062.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1617/ 159576 | consumed samples: 25872 | elapsed time per iteration (ms): 13663.6 | learning rate: 7.172E-06 | global batch size: 16 | lm loss: 7.238441E+00 | loss scale: 32768.0 | grad norm: 168294.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1618/ 159576 | consumed samples: 25888 | elapsed time per iteration (ms): 13675.4 | learning rate: 7.176E-06 | global batch size: 16 | lm loss: 7.159503E+00 | loss scale: 32768.0 | grad norm: 181012.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1619/ 159576 | consumed samples: 25904 | elapsed time per iteration (ms): 13559.3 | learning rate: 7.180E-06 | global batch size: 16 | lm loss: 7.125117E+00 | loss scale: 32768.0 | grad norm: 156261.868 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1620/ 159576 | consumed samples: 25920 | elapsed time per iteration (ms): 14141.4 | learning rate: 7.185E-06 | global batch size: 16 | lm loss: 7.312489E+00 | loss scale: 32768.0 | grad norm: 501804.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1621/ 159576 | consumed samples: 25936 | elapsed time per iteration (ms): 13619.8 | learning rate: 7.189E-06 | global batch size: 16 | lm loss: 7.144738E+00 | loss scale: 32768.0 | grad norm: 187512.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1622/ 159576 | consumed samples: 25952 | elapsed time per iteration (ms): 13623.1 | learning rate: 7.194E-06 | global batch size: 16 | lm loss: 7.036147E+00 | loss scale: 32768.0 | grad norm: 185668.156 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1623/ 159576 | consumed samples: 25968 | elapsed time per iteration (ms): 13626.1 | learning rate: 7.198E-06 | global batch size: 16 | lm loss: 6.981637E+00 | loss scale: 32768.0 | grad norm: 194478.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1624/ 159576 | consumed samples: 25984 | elapsed time per iteration (ms): 13916.5 | learning rate: 7.203E-06 | global batch size: 16 | lm loss: 7.098595E+00 | loss scale: 32768.0 | grad norm: 176876.504 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1625/ 159576 | consumed samples: 26000 | elapsed time per iteration (ms): 13897.1 | learning rate: 7.207E-06 | global batch size: 16 | lm loss: 7.024785E+00 | loss scale: 32768.0 | grad norm: 133422.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1626/ 159576 | consumed samples: 26016 | elapsed time per iteration (ms): 13553.3 | learning rate: 7.212E-06 | global batch size: 16 | lm loss: 7.101878E+00 | loss scale: 32768.0 | grad norm: 187471.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1627/ 159576 | consumed samples: 26032 | elapsed time per iteration (ms): 13608.6 | learning rate: 7.216E-06 | global batch size: 16 | lm loss: 7.083658E+00 | loss scale: 32768.0 | grad norm: 163022.597 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1628/ 159576 | consumed samples: 26048 | elapsed time per iteration (ms): 13598.7 | learning rate: 7.220E-06 | global batch size: 16 | lm loss: 7.128680E+00 | loss scale: 32768.0 | grad norm: 227341.519 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1629/ 159576 | consumed samples: 26064 | elapsed time per iteration (ms): 13737.0 | learning rate: 7.225E-06 | global batch size: 16 | lm loss: 7.226182E+00 | loss scale: 32768.0 | grad norm: 173557.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1630/ 159576 | consumed samples: 26080 | elapsed time per iteration (ms): 13598.4 | learning rate: 7.229E-06 | global batch size: 16 | lm loss: 7.204190E+00 | loss scale: 32768.0 | grad norm: 194336.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1631/ 159576 | consumed samples: 26096 | elapsed time per iteration (ms): 13618.5 | learning rate: 7.234E-06 | global batch size: 16 | lm loss: 7.295867E+00 | loss scale: 32768.0 | grad norm: 218111.651 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1632/ 159576 | consumed samples: 26112 | elapsed time per iteration (ms): 13608.1 | learning rate: 7.238E-06 | global batch size: 16 | lm loss: 7.313629E+00 | loss scale: 32768.0 | grad norm: 150755.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1633/ 159576 | consumed samples: 26128 | elapsed time per iteration (ms): 13926.3 | learning rate: 7.243E-06 | global batch size: 16 | lm loss: 7.105534E+00 | loss scale: 32768.0 | grad norm: 416417.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1634/ 159576 | consumed samples: 26144 | elapsed time per iteration (ms): 13573.4 | learning rate: 7.247E-06 | global batch size: 16 | lm loss: 7.154237E+00 | loss scale: 32768.0 | grad norm: 222886.895 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1635/ 159576 | consumed samples: 26160 | elapsed time per iteration (ms): 13613.9 | learning rate: 7.251E-06 | global batch size: 16 | lm loss: 7.367383E+00 | loss scale: 32768.0 | grad norm: 198928.120 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1636/ 159576 | consumed samples: 26176 | elapsed time per iteration (ms): 13620.0 | learning rate: 7.256E-06 | global batch size: 16 | lm loss: 7.224826E+00 | loss scale: 32768.0 | grad norm: 190490.724 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1637/ 159576 | consumed samples: 26192 | elapsed time per iteration (ms): 13847.4 | learning rate: 7.260E-06 | global batch size: 16 | lm loss: 7.133263E+00 | loss scale: 32768.0 | grad norm: 335044.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1638/ 159576 | consumed samples: 26208 | elapsed time per iteration (ms): 13680.4 | learning rate: 7.265E-06 | global batch size: 16 | lm loss: 6.991650E+00 | loss scale: 32768.0 | grad norm: 351935.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1639/ 159576 | consumed samples: 26224 | elapsed time per iteration (ms): 13603.3 | learning rate: 7.269E-06 | global batch size: 16 | lm loss: 7.261710E+00 | loss scale: 32768.0 | grad norm: 162679.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1640/ 159576 | consumed samples: 26240 | elapsed time per iteration (ms): 13643.0 | learning rate: 7.274E-06 | global batch size: 16 | lm loss: 7.243075E+00 | loss scale: 32768.0 | grad norm: 139259.853 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1641/ 159576 | consumed samples: 26256 | elapsed time per iteration (ms): 13685.4 | learning rate: 7.278E-06 | global batch size: 16 | lm loss: 7.347486E+00 | loss scale: 32768.0 | grad norm: 190145.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1642/ 159576 | consumed samples: 26272 | elapsed time per iteration (ms): 13709.0 | learning rate: 7.283E-06 | global batch size: 16 | lm loss: 7.168586E+00 | loss scale: 32768.0 | grad norm: 250612.086 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1643/ 159576 | consumed samples: 26288 | elapsed time per iteration (ms): 13686.3 | learning rate: 7.287E-06 | global batch size: 16 | lm loss: 7.042645E+00 | loss scale: 32768.0 | grad norm: 181688.669 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1644/ 159576 | consumed samples: 26304 | elapsed time per iteration (ms): 13617.6 | learning rate: 7.291E-06 | global batch size: 16 | lm loss: 6.992811E+00 | loss scale: 32768.0 | grad norm: 173387.997 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1645/ 159576 | consumed samples: 26320 | elapsed time per iteration (ms): 13588.3 | learning rate: 7.296E-06 | global batch size: 16 | lm loss: 6.948548E+00 | loss scale: 32768.0 | grad norm: 204171.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1646/ 159576 | consumed samples: 26336 | elapsed time per iteration (ms): 13943.8 | learning rate: 7.300E-06 | global batch size: 16 | lm loss: 7.227940E+00 | loss scale: 32768.0 | grad norm: 249546.841 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1647/ 159576 | consumed samples: 26352 | elapsed time per iteration (ms): 13526.7 | learning rate: 7.305E-06 | global batch size: 16 | lm loss: 7.150325E+00 | loss scale: 32768.0 | grad norm: 187163.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1648/ 159576 | consumed samples: 26368 | elapsed time per iteration (ms): 13689.1 | learning rate: 7.309E-06 | global batch size: 16 | lm loss: 7.017026E+00 | loss scale: 32768.0 | grad norm: 155331.100 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1649/ 159576 | consumed samples: 26384 | elapsed time per iteration (ms): 13592.0 | learning rate: 7.314E-06 | global batch size: 16 | lm loss: 6.946849E+00 | loss scale: 32768.0 | grad norm: 224463.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1650/ 159576 | consumed samples: 26400 | elapsed time per iteration (ms): 13576.3 | learning rate: 7.318E-06 | global batch size: 16 | lm loss: 7.179192E+00 | loss scale: 32768.0 | grad norm: 276611.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1651/ 159576 | consumed samples: 26416 | elapsed time per iteration (ms): 13958.1 | learning rate: 7.322E-06 | global batch size: 16 | lm loss: 7.176366E+00 | loss scale: 32768.0 | grad norm: 180366.507 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1652/ 159576 | consumed samples: 26432 | elapsed time per iteration (ms): 13632.4 | learning rate: 7.327E-06 | global batch size: 16 | lm loss: 7.206745E+00 | loss scale: 32768.0 | grad norm: 135845.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1653/ 159576 | consumed samples: 26448 | elapsed time per iteration (ms): 13613.1 | learning rate: 7.331E-06 | global batch size: 16 | lm loss: 7.259154E+00 | loss scale: 32768.0 | grad norm: 403068.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1654/ 159576 | consumed samples: 26464 | elapsed time per iteration (ms): 13593.5 | learning rate: 7.336E-06 | global batch size: 16 | lm loss: 7.201679E+00 | loss scale: 32768.0 | grad norm: 362463.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1655/ 159576 | consumed samples: 26480 | elapsed time per iteration (ms): 14016.8 | learning rate: 7.340E-06 | global batch size: 16 | lm loss: 7.291797E+00 | loss scale: 32768.0 | grad norm: 167369.816 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1656/ 159576 | consumed samples: 26496 | elapsed time per iteration (ms): 13699.1 | learning rate: 7.345E-06 | global batch size: 16 | lm loss: 7.091952E+00 | loss scale: 32768.0 | grad norm: 165135.009 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1657/ 159576 | consumed samples: 26512 | elapsed time per iteration (ms): 13569.2 | learning rate: 7.349E-06 | global batch size: 16 | lm loss: 7.068718E+00 | loss scale: 32768.0 | grad norm: 202181.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1658/ 159576 | consumed samples: 26528 | elapsed time per iteration (ms): 13577.2 | learning rate: 7.354E-06 | global batch size: 16 | lm loss: 7.233033E+00 | loss scale: 32768.0 | grad norm: 333361.854 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1659/ 159576 | consumed samples: 26544 | elapsed time per iteration (ms): 13970.5 | learning rate: 7.358E-06 | global batch size: 16 | lm loss: 7.330973E+00 | loss scale: 32768.0 | grad norm: 164401.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1660/ 159576 | consumed samples: 26560 | elapsed time per iteration (ms): 13585.6 | learning rate: 7.362E-06 | global batch size: 16 | lm loss: 7.127686E+00 | loss scale: 32768.0 | grad norm: 165830.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1661/ 159576 | consumed samples: 26576 | elapsed time per iteration (ms): 13601.7 | learning rate: 7.367E-06 | global batch size: 16 | lm loss: 7.202850E+00 | loss scale: 32768.0 | grad norm: 214035.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1662/ 159576 | consumed samples: 26592 | elapsed time per iteration (ms): 13596.7 | learning rate: 7.371E-06 | global batch size: 16 | lm loss: 7.194968E+00 | loss scale: 32768.0 | grad norm: 269427.808 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1663/ 159576 | consumed samples: 26608 | elapsed time per iteration (ms): 13626.2 | learning rate: 7.376E-06 | global batch size: 16 | lm loss: 7.079875E+00 | loss scale: 32768.0 | grad norm: 243204.527 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1664/ 159576 | consumed samples: 26624 | elapsed time per iteration (ms): 13820.6 | learning rate: 7.380E-06 | global batch size: 16 | lm loss: 7.253979E+00 | loss scale: 32768.0 | grad norm: 184892.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1665/ 159576 | consumed samples: 26640 | elapsed time per iteration (ms): 13606.7 | learning rate: 7.385E-06 | global batch size: 16 | lm loss: 7.021820E+00 | loss scale: 32768.0 | grad norm: 220398.877 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1666/ 159576 | consumed samples: 26656 | elapsed time per iteration (ms): 13594.3 | learning rate: 7.389E-06 | global batch size: 16 | lm loss: 7.115512E+00 | loss scale: 32768.0 | grad norm: 307682.966 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1667/ 159576 | consumed samples: 26672 | elapsed time per iteration (ms): 13584.1 | learning rate: 7.393E-06 | global batch size: 16 | lm loss: 7.301219E+00 | loss scale: 32768.0 | grad norm: 326739.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1668/ 159576 | consumed samples: 26688 | elapsed time per iteration (ms): 13934.9 | learning rate: 7.398E-06 | global batch size: 16 | lm loss: 7.091152E+00 | loss scale: 32768.0 | grad norm: 179218.130 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1669/ 159576 | consumed samples: 26704 | elapsed time per iteration (ms): 13576.9 | learning rate: 7.402E-06 | global batch size: 16 | lm loss: 7.060991E+00 | loss scale: 32768.0 | grad norm: 212478.902 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1670/ 159576 | consumed samples: 26720 | elapsed time per iteration (ms): 13622.1 | learning rate: 7.407E-06 | global batch size: 16 | lm loss: 7.225494E+00 | loss scale: 32768.0 | grad norm: 312859.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1671/ 159576 | consumed samples: 26736 | elapsed time per iteration (ms): 13558.9 | learning rate: 7.411E-06 | global batch size: 16 | lm loss: 6.931543E+00 | loss scale: 32768.0 | grad norm: 214910.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1672/ 159576 | consumed samples: 26752 | elapsed time per iteration (ms): 13593.0 | learning rate: 7.416E-06 | global batch size: 16 | lm loss: 7.111391E+00 | loss scale: 32768.0 | grad norm: 167374.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1673/ 159576 | consumed samples: 26768 | elapsed time per iteration (ms): 14083.5 | learning rate: 7.420E-06 | global batch size: 16 | lm loss: 7.119873E+00 | loss scale: 32768.0 | grad norm: 207656.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1674/ 159576 | consumed samples: 26784 | elapsed time per iteration (ms): 13580.7 | learning rate: 7.425E-06 | global batch size: 16 | lm loss: 7.190612E+00 | loss scale: 32768.0 | grad norm: 138716.556 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1675/ 159576 | consumed samples: 26800 | elapsed time per iteration (ms): 13560.5 | learning rate: 7.429E-06 | global batch size: 16 | lm loss: 7.118540E+00 | loss scale: 32768.0 | grad norm: 288523.946 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1676/ 159576 | consumed samples: 26816 | elapsed time per iteration (ms): 13591.4 | learning rate: 7.433E-06 | global batch size: 16 | lm loss: 7.228687E+00 | loss scale: 32768.0 | grad norm: 184651.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1677/ 159576 | consumed samples: 26832 | elapsed time per iteration (ms): 14019.3 | learning rate: 7.438E-06 | global batch size: 16 | lm loss: 7.062222E+00 | loss scale: 32768.0 | grad norm: 166988.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1678/ 159576 | consumed samples: 26848 | elapsed time per iteration (ms): 13663.4 | learning rate: 7.442E-06 | global batch size: 16 | lm loss: 7.206205E+00 | loss scale: 32768.0 | grad norm: 760966.811 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1679/ 159576 | consumed samples: 26864 | elapsed time per iteration (ms): 13583.3 | learning rate: 7.447E-06 | global batch size: 16 | lm loss: 7.183750E+00 | loss scale: 32768.0 | grad norm: 619056.103 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1680/ 159576 | consumed samples: 26880 | elapsed time per iteration (ms): 13598.8 | learning rate: 7.451E-06 | global batch size: 16 | lm loss: 7.188565E+00 | loss scale: 32768.0 | grad norm: 363445.728 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1681/ 159576 | consumed samples: 26896 | elapsed time per iteration (ms): 14083.3 | learning rate: 7.456E-06 | global batch size: 16 | lm loss: 7.135269E+00 | loss scale: 32768.0 | grad norm: 201434.725 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1682/ 159576 | consumed samples: 26912 | elapsed time per iteration (ms): 13432.4 | learning rate: 7.460E-06 | global batch size: 16 | lm loss: 7.080773E+00 | loss scale: 32768.0 | grad norm: 223123.023 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1683/ 159576 | consumed samples: 26928 | elapsed time per iteration (ms): 13629.9 | learning rate: 7.464E-06 | global batch size: 16 | lm loss: 7.018581E+00 | loss scale: 32768.0 | grad norm: 160716.882 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1684/ 159576 | consumed samples: 26944 | elapsed time per iteration (ms): 13543.1 | learning rate: 7.469E-06 | global batch size: 16 | lm loss: 7.045646E+00 | loss scale: 32768.0 | grad norm: 319366.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1685/ 159576 | consumed samples: 26960 | elapsed time per iteration (ms): 13556.0 | learning rate: 7.473E-06 | global batch size: 16 | lm loss: 7.139486E+00 | loss scale: 32768.0 | grad norm: 154250.022 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1686/ 159576 | consumed samples: 26976 | elapsed time per iteration (ms): 13875.3 | learning rate: 7.478E-06 | global batch size: 16 | lm loss: 7.146173E+00 | loss scale: 32768.0 | grad norm: 186495.170 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1687/ 159576 | consumed samples: 26992 | elapsed time per iteration (ms): 13583.8 | learning rate: 7.482E-06 | global batch size: 16 | lm loss: 7.207047E+00 | loss scale: 32768.0 | grad norm: 129574.140 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1688/ 159576 | consumed samples: 27008 | elapsed time per iteration (ms): 13590.1 | learning rate: 7.487E-06 | global batch size: 16 | lm loss: 7.150177E+00 | loss scale: 32768.0 | grad norm: 310199.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1689/ 159576 | consumed samples: 27024 | elapsed time per iteration (ms): 13636.7 | learning rate: 7.491E-06 | global batch size: 16 | lm loss: 7.136959E+00 | loss scale: 32768.0 | grad norm: 142456.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1690/ 159576 | consumed samples: 27040 | elapsed time per iteration (ms): 13898.3 | learning rate: 7.496E-06 | global batch size: 16 | lm loss: 6.991103E+00 | loss scale: 32768.0 | grad norm: 206942.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1691/ 159576 | consumed samples: 27056 | elapsed time per iteration (ms): 13637.0 | learning rate: 7.500E-06 | global batch size: 16 | lm loss: 7.147140E+00 | loss scale: 32768.0 | grad norm: 297164.074 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1692/ 159576 | consumed samples: 27072 | elapsed time per iteration (ms): 13592.2 | learning rate: 7.504E-06 | global batch size: 16 | lm loss: 7.166695E+00 | loss scale: 32768.0 | grad norm: 174829.948 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1693/ 159576 | consumed samples: 27088 | elapsed time per iteration (ms): 13634.0 | learning rate: 7.509E-06 | global batch size: 16 | lm loss: 7.124074E+00 | loss scale: 32768.0 | grad norm: 356202.604 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1694/ 159576 | consumed samples: 27104 | elapsed time per iteration (ms): 13929.9 | learning rate: 7.513E-06 | global batch size: 16 | lm loss: 7.219958E+00 | loss scale: 32768.0 | grad norm: 288764.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1695/ 159576 | consumed samples: 27120 | elapsed time per iteration (ms): 13812.8 | learning rate: 7.518E-06 | global batch size: 16 | lm loss: 7.030488E+00 | loss scale: 32768.0 | grad norm: 164638.861 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1696/ 159576 | consumed samples: 27136 | elapsed time per iteration (ms): 13601.5 | learning rate: 7.522E-06 | global batch size: 16 | lm loss: 7.288185E+00 | loss scale: 32768.0 | grad norm: 241747.916 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1697/ 159576 | consumed samples: 27152 | elapsed time per iteration (ms): 13619.0 | learning rate: 7.527E-06 | global batch size: 16 | lm loss: 7.110942E+00 | loss scale: 32768.0 | grad norm: 183251.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1698/ 159576 | consumed samples: 27168 | elapsed time per iteration (ms): 13580.4 | learning rate: 7.531E-06 | global batch size: 16 | lm loss: 7.096193E+00 | loss scale: 32768.0 | grad norm: 187930.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1699/ 159576 | consumed samples: 27184 | elapsed time per iteration (ms): 14055.7 | learning rate: 7.536E-06 | global batch size: 16 | lm loss: 6.976962E+00 | loss scale: 32768.0 | grad norm: 186599.931 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1700/ 159576 | consumed samples: 27200 | elapsed time per iteration (ms): 13642.0 | learning rate: 7.540E-06 | global batch size: 16 | lm loss: 6.916706E+00 | loss scale: 32768.0 | grad norm: 212948.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1701/ 159576 | consumed samples: 27216 | elapsed time per iteration (ms): 13615.0 | learning rate: 7.544E-06 | global batch size: 16 | lm loss: 7.194331E+00 | loss scale: 32768.0 | grad norm: 144812.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1702/ 159576 | consumed samples: 27232 | elapsed time per iteration (ms): 13551.3 | learning rate: 7.549E-06 | global batch size: 16 | lm loss: 7.139325E+00 | loss scale: 32768.0 | grad norm: 331590.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1703/ 159576 | consumed samples: 27248 | elapsed time per iteration (ms): 13973.8 | learning rate: 7.553E-06 | global batch size: 16 | lm loss: 7.042914E+00 | loss scale: 32768.0 | grad norm: 195366.856 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1704/ 159576 | consumed samples: 27264 | elapsed time per iteration (ms): 13614.8 | learning rate: 7.558E-06 | global batch size: 16 | lm loss: 7.087082E+00 | loss scale: 32768.0 | grad norm: 217381.135 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1705/ 159576 | consumed samples: 27280 | elapsed time per iteration (ms): 13611.2 | learning rate: 7.562E-06 | global batch size: 16 | lm loss: 7.013979E+00 | loss scale: 32768.0 | grad norm: 198091.797 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1706/ 159576 | consumed samples: 27296 | elapsed time per iteration (ms): 13574.3 | learning rate: 7.567E-06 | global batch size: 16 | lm loss: 7.016004E+00 | loss scale: 32768.0 | grad norm: 222098.009 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1707/ 159576 | consumed samples: 27312 | elapsed time per iteration (ms): 13629.3 | learning rate: 7.571E-06 | global batch size: 16 | lm loss: 7.175000E+00 | loss scale: 32768.0 | grad norm: 409215.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1708/ 159576 | consumed samples: 27328 | elapsed time per iteration (ms): 13904.2 | learning rate: 7.575E-06 | global batch size: 16 | lm loss: 7.071371E+00 | loss scale: 32768.0 | grad norm: 273410.975 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1709/ 159576 | consumed samples: 27344 | elapsed time per iteration (ms): 13558.1 | learning rate: 7.580E-06 | global batch size: 16 | lm loss: 7.002718E+00 | loss scale: 32768.0 | grad norm: 197884.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1710/ 159576 | consumed samples: 27360 | elapsed time per iteration (ms): 13639.3 | learning rate: 7.584E-06 | global batch size: 16 | lm loss: 7.323861E+00 | loss scale: 32768.0 | grad norm: 172073.111 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1711/ 159576 | consumed samples: 27376 | elapsed time per iteration (ms): 13631.6 | learning rate: 7.589E-06 | global batch size: 16 | lm loss: 6.922392E+00 | loss scale: 32768.0 | grad norm: 326721.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1712/ 159576 | consumed samples: 27392 | elapsed time per iteration (ms): 13982.8 | learning rate: 7.593E-06 | global batch size: 16 | lm loss: 7.148055E+00 | loss scale: 32768.0 | grad norm: 280337.172 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1713/ 159576 | consumed samples: 27408 | elapsed time per iteration (ms): 13635.8 | learning rate: 7.598E-06 | global batch size: 16 | lm loss: 7.088178E+00 | loss scale: 32768.0 | grad norm: 200762.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1714/ 159576 | consumed samples: 27424 | elapsed time per iteration (ms): 13581.9 | learning rate: 7.602E-06 | global batch size: 16 | lm loss: 7.096650E+00 | loss scale: 32768.0 | grad norm: 204299.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1715/ 159576 | consumed samples: 27440 | elapsed time per iteration (ms): 13647.6 | learning rate: 7.607E-06 | global batch size: 16 | lm loss: 6.916616E+00 | loss scale: 32768.0 | grad norm: 127407.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1716/ 159576 | consumed samples: 27456 | elapsed time per iteration (ms): 13904.0 | learning rate: 7.611E-06 | global batch size: 16 | lm loss: 7.066643E+00 | loss scale: 32768.0 | grad norm: 371440.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1717/ 159576 | consumed samples: 27472 | elapsed time per iteration (ms): 13717.4 | learning rate: 7.615E-06 | global batch size: 16 | lm loss: 7.332389E+00 | loss scale: 32768.0 | grad norm: 403592.093 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1718/ 159576 | consumed samples: 27488 | elapsed time per iteration (ms): 13591.7 | learning rate: 7.620E-06 | global batch size: 16 | lm loss: 7.055027E+00 | loss scale: 32768.0 | grad norm: 200151.647 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1719/ 159576 | consumed samples: 27504 | elapsed time per iteration (ms): 13560.8 | learning rate: 7.624E-06 | global batch size: 16 | lm loss: 7.176567E+00 | loss scale: 32768.0 | grad norm: 144423.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1720/ 159576 | consumed samples: 27520 | elapsed time per iteration (ms): 13600.7 | learning rate: 7.629E-06 | global batch size: 16 | lm loss: 6.984463E+00 | loss scale: 32768.0 | grad norm: 303766.844 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1721/ 159576 | consumed samples: 27536 | elapsed time per iteration (ms): 13892.8 | learning rate: 7.633E-06 | global batch size: 16 | lm loss: 6.990324E+00 | loss scale: 32768.0 | grad norm: 154861.936 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1722/ 159576 | consumed samples: 27552 | elapsed time per iteration (ms): 13527.0 | learning rate: 7.638E-06 | global batch size: 16 | lm loss: 7.238751E+00 | loss scale: 32768.0 | grad norm: 231731.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1723/ 159576 | consumed samples: 27568 | elapsed time per iteration (ms): 13536.8 | learning rate: 7.642E-06 | global batch size: 16 | lm loss: 7.130395E+00 | loss scale: 32768.0 | grad norm: 190824.462 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1724/ 159576 | consumed samples: 27584 | elapsed time per iteration (ms): 13580.6 | learning rate: 7.646E-06 | global batch size: 16 | lm loss: 7.182058E+00 | loss scale: 32768.0 | grad norm: 266208.840 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1725/ 159576 | consumed samples: 27600 | elapsed time per iteration (ms): 13961.0 | learning rate: 7.651E-06 | global batch size: 16 | lm loss: 7.108085E+00 | loss scale: 32768.0 | grad norm: 284420.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1726/ 159576 | consumed samples: 27616 | elapsed time per iteration (ms): 13537.5 | learning rate: 7.655E-06 | global batch size: 16 | lm loss: 7.049166E+00 | loss scale: 32768.0 | grad norm: 189929.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1727/ 159576 | consumed samples: 27632 | elapsed time per iteration (ms): 13583.4 | learning rate: 7.660E-06 | global batch size: 16 | lm loss: 7.012967E+00 | loss scale: 32768.0 | grad norm: 174720.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1728/ 159576 | consumed samples: 27648 | elapsed time per iteration (ms): 13605.5 | learning rate: 7.664E-06 | global batch size: 16 | lm loss: 7.237570E+00 | loss scale: 32768.0 | grad norm: 194798.770 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1729/ 159576 | consumed samples: 27664 | elapsed time per iteration (ms): 13552.5 | learning rate: 7.669E-06 | global batch size: 16 | lm loss: 7.138112E+00 | loss scale: 32768.0 | grad norm: 289252.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1730/ 159576 | consumed samples: 27680 | elapsed time per iteration (ms): 14055.9 | learning rate: 7.673E-06 | global batch size: 16 | lm loss: 7.041800E+00 | loss scale: 32768.0 | grad norm: 190020.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1731/ 159576 | consumed samples: 27696 | elapsed time per iteration (ms): 13571.4 | learning rate: 7.678E-06 | global batch size: 16 | lm loss: 7.037878E+00 | loss scale: 32768.0 | grad norm: 149538.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1732/ 159576 | consumed samples: 27712 | elapsed time per iteration (ms): 13585.4 | learning rate: 7.682E-06 | global batch size: 16 | lm loss: 7.179647E+00 | loss scale: 32768.0 | grad norm: 151351.062 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1733/ 159576 | consumed samples: 27728 | elapsed time per iteration (ms): 13582.2 | learning rate: 7.686E-06 | global batch size: 16 | lm loss: 7.234662E+00 | loss scale: 32768.0 | grad norm: 317716.715 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1734/ 159576 | consumed samples: 27744 | elapsed time per iteration (ms): 14148.8 | learning rate: 7.691E-06 | global batch size: 16 | lm loss: 7.306998E+00 | loss scale: 32768.0 | grad norm: 216190.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1735/ 159576 | consumed samples: 27760 | elapsed time per iteration (ms): 13664.2 | learning rate: 7.695E-06 | global batch size: 16 | lm loss: 7.130812E+00 | loss scale: 32768.0 | grad norm: 168041.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1736/ 159576 | consumed samples: 27776 | elapsed time per iteration (ms): 13539.2 | learning rate: 7.700E-06 | global batch size: 16 | lm loss: 7.164721E+00 | loss scale: 32768.0 | grad norm: 189764.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1737/ 159576 | consumed samples: 27792 | elapsed time per iteration (ms): 13580.1 | learning rate: 7.704E-06 | global batch size: 16 | lm loss: 7.213598E+00 | loss scale: 32768.0 | grad norm: 231432.124 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1738/ 159576 | consumed samples: 27808 | elapsed time per iteration (ms): 13874.0 | learning rate: 7.709E-06 | global batch size: 16 | lm loss: 7.064263E+00 | loss scale: 32768.0 | grad norm: 332299.668 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1739/ 159576 | consumed samples: 27824 | elapsed time per iteration (ms): 13542.8 | learning rate: 7.713E-06 | global batch size: 16 | lm loss: 7.187717E+00 | loss scale: 32768.0 | grad norm: 159503.470 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1740/ 159576 | consumed samples: 27840 | elapsed time per iteration (ms): 13564.1 | learning rate: 7.717E-06 | global batch size: 16 | lm loss: 7.212025E+00 | loss scale: 32768.0 | grad norm: 275497.658 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1741/ 159576 | consumed samples: 27856 | elapsed time per iteration (ms): 13584.8 | learning rate: 7.722E-06 | global batch size: 16 | lm loss: 6.960712E+00 | loss scale: 32768.0 | grad norm: 307419.828 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1742/ 159576 | consumed samples: 27872 | elapsed time per iteration (ms): 13621.1 | learning rate: 7.726E-06 | global batch size: 16 | lm loss: 7.086576E+00 | loss scale: 32768.0 | grad norm: 156758.997 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1743/ 159576 | consumed samples: 27888 | elapsed time per iteration (ms): 13719.9 | learning rate: 7.731E-06 | global batch size: 16 | lm loss: 6.961288E+00 | loss scale: 32768.0 | grad norm: 147761.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1744/ 159576 | consumed samples: 27904 | elapsed time per iteration (ms): 13570.6 | learning rate: 7.735E-06 | global batch size: 16 | lm loss: 7.320576E+00 | loss scale: 32768.0 | grad norm: 309786.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1745/ 159576 | consumed samples: 27920 | elapsed time per iteration (ms): 13600.3 | learning rate: 7.740E-06 | global batch size: 16 | lm loss: 7.218632E+00 | loss scale: 32768.0 | grad norm: 330698.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1746/ 159576 | consumed samples: 27936 | elapsed time per iteration (ms): 13548.3 | learning rate: 7.744E-06 | global batch size: 16 | lm loss: 7.139973E+00 | loss scale: 32768.0 | grad norm: 376967.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1747/ 159576 | consumed samples: 27952 | elapsed time per iteration (ms): 13954.3 | learning rate: 7.749E-06 | global batch size: 16 | lm loss: 7.074110E+00 | loss scale: 32768.0 | grad norm: 214147.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1748/ 159576 | consumed samples: 27968 | elapsed time per iteration (ms): 13621.8 | learning rate: 7.753E-06 | global batch size: 16 | lm loss: 7.254288E+00 | loss scale: 32768.0 | grad norm: 128937.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1749/ 159576 | consumed samples: 27984 | elapsed time per iteration (ms): 13626.6 | learning rate: 7.757E-06 | global batch size: 16 | lm loss: 7.009082E+00 | loss scale: 32768.0 | grad norm: 392446.478 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1750/ 159576 | consumed samples: 28000 | elapsed time per iteration (ms): 13590.6 | learning rate: 7.762E-06 | global batch size: 16 | lm loss: 6.949193E+00 | loss scale: 32768.0 | grad norm: 205911.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1751/ 159576 | consumed samples: 28016 | elapsed time per iteration (ms): 13916.9 | learning rate: 7.766E-06 | global batch size: 16 | lm loss: 7.175614E+00 | loss scale: 32768.0 | grad norm: 181359.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1752/ 159576 | consumed samples: 28032 | elapsed time per iteration (ms): 13747.5 | learning rate: 7.771E-06 | global batch size: 16 | lm loss: 7.084972E+00 | loss scale: 32768.0 | grad norm: 191810.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1753/ 159576 | consumed samples: 28048 | elapsed time per iteration (ms): 13591.1 | learning rate: 7.775E-06 | global batch size: 16 | lm loss: 7.125815E+00 | loss scale: 32768.0 | grad norm: 150833.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1754/ 159576 | consumed samples: 28064 | elapsed time per iteration (ms): 13552.4 | learning rate: 7.780E-06 | global batch size: 16 | lm loss: 7.096021E+00 | loss scale: 32768.0 | grad norm: 858159.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1755/ 159576 | consumed samples: 28080 | elapsed time per iteration (ms): 13586.8 | learning rate: 7.784E-06 | global batch size: 16 | lm loss: 7.401230E+00 | loss scale: 32768.0 | grad norm: 1015122.062 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1756/ 159576 | consumed samples: 28096 | elapsed time per iteration (ms): 14062.7 | learning rate: 7.788E-06 | global batch size: 16 | lm loss: 7.141807E+00 | loss scale: 32768.0 | grad norm: 241473.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1757/ 159576 | consumed samples: 28112 | elapsed time per iteration (ms): 13654.9 | learning rate: 7.793E-06 | global batch size: 16 | lm loss: 7.055682E+00 | loss scale: 32768.0 | grad norm: 195258.121 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1758/ 159576 | consumed samples: 28128 | elapsed time per iteration (ms): 13576.6 | learning rate: 7.797E-06 | global batch size: 16 | lm loss: 6.887124E+00 | loss scale: 32768.0 | grad norm: 209948.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1759/ 159576 | consumed samples: 28144 | elapsed time per iteration (ms): 13615.8 | learning rate: 7.802E-06 | global batch size: 16 | lm loss: 7.008955E+00 | loss scale: 32768.0 | grad norm: 218109.807 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1760/ 159576 | consumed samples: 28160 | elapsed time per iteration (ms): 13880.5 | learning rate: 7.806E-06 | global batch size: 16 | lm loss: 7.156555E+00 | loss scale: 32768.0 | grad norm: 199049.119 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1761/ 159576 | consumed samples: 28176 | elapsed time per iteration (ms): 13559.3 | learning rate: 7.811E-06 | global batch size: 16 | lm loss: 7.445184E+00 | loss scale: 32768.0 | grad norm: 571721.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1762/ 159576 | consumed samples: 28192 | elapsed time per iteration (ms): 13597.9 | learning rate: 7.815E-06 | global batch size: 16 | lm loss: 7.408930E+00 | loss scale: 32768.0 | grad norm: 477324.031 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1763/ 159576 | consumed samples: 28208 | elapsed time per iteration (ms): 13646.1 | learning rate: 7.820E-06 | global batch size: 16 | lm loss: 7.228862E+00 | loss scale: 32768.0 | grad norm: 183806.995 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1764/ 159576 | consumed samples: 28224 | elapsed time per iteration (ms): 13595.0 | learning rate: 7.824E-06 | global batch size: 16 | lm loss: 7.213759E+00 | loss scale: 32768.0 | grad norm: 199120.863 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1765/ 159576 | consumed samples: 28240 | elapsed time per iteration (ms): 13787.5 | learning rate: 7.828E-06 | global batch size: 16 | lm loss: 7.190694E+00 | loss scale: 32768.0 | grad norm: 230903.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1766/ 159576 | consumed samples: 28256 | elapsed time per iteration (ms): 13655.5 | learning rate: 7.833E-06 | global batch size: 16 | lm loss: 7.120300E+00 | loss scale: 32768.0 | grad norm: 189933.714 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1767/ 159576 | consumed samples: 28272 | elapsed time per iteration (ms): 13622.2 | learning rate: 7.837E-06 | global batch size: 16 | lm loss: 7.216377E+00 | loss scale: 32768.0 | grad norm: 411494.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1768/ 159576 | consumed samples: 28288 | elapsed time per iteration (ms): 13542.4 | learning rate: 7.842E-06 | global batch size: 16 | lm loss: 7.296084E+00 | loss scale: 32768.0 | grad norm: 211545.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1769/ 159576 | consumed samples: 28304 | elapsed time per iteration (ms): 13835.8 | learning rate: 7.846E-06 | global batch size: 16 | lm loss: 7.014185E+00 | loss scale: 32768.0 | grad norm: 159627.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1770/ 159576 | consumed samples: 28320 | elapsed time per iteration (ms): 13625.8 | learning rate: 7.851E-06 | global batch size: 16 | lm loss: 7.260840E+00 | loss scale: 32768.0 | grad norm: 230760.798 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 09:03:50] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 09:03:50] PULSE: tr8-104B is running for 3:11:39 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 1771/ 159576 | consumed samples: 28336 | elapsed time per iteration (ms): 13609.6 | learning rate: 7.855E-06 | global batch size: 16 | lm loss: 7.096549E+00 | loss scale: 32768.0 | grad norm: 208126.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1772/ 159576 | consumed samples: 28352 | elapsed time per iteration (ms): 13612.5 | learning rate: 7.859E-06 | global batch size: 16 | lm loss: 7.288601E+00 | loss scale: 32768.0 | grad norm: 299861.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1773/ 159576 | consumed samples: 28368 | elapsed time per iteration (ms): 14036.1 | learning rate: 7.864E-06 | global batch size: 16 | lm loss: 7.006525E+00 | loss scale: 32768.0 | grad norm: 221185.737 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1774/ 159576 | consumed samples: 28384 | elapsed time per iteration (ms): 13455.1 | learning rate: 7.868E-06 | global batch size: 16 | lm loss: 7.057816E+00 | loss scale: 32768.0 | grad norm: 211669.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1775/ 159576 | consumed samples: 28400 | elapsed time per iteration (ms): 13580.5 | learning rate: 7.873E-06 | global batch size: 16 | lm loss: 7.225205E+00 | loss scale: 32768.0 | grad norm: 232985.961 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1776/ 159576 | consumed samples: 28416 | elapsed time per iteration (ms): 13577.7 | learning rate: 7.877E-06 | global batch size: 16 | lm loss: 7.090505E+00 | loss scale: 32768.0 | grad norm: 148862.985 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1777/ 159576 | consumed samples: 28432 | elapsed time per iteration (ms): 13633.9 | learning rate: 7.882E-06 | global batch size: 16 | lm loss: 7.291343E+00 | loss scale: 32768.0 | grad norm: 241931.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1778/ 159576 | consumed samples: 28448 | elapsed time per iteration (ms): 13810.9 | learning rate: 7.886E-06 | global batch size: 16 | lm loss: 7.168088E+00 | loss scale: 32768.0 | grad norm: 186155.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1779/ 159576 | consumed samples: 28464 | elapsed time per iteration (ms): 13677.6 | learning rate: 7.891E-06 | global batch size: 16 | lm loss: 6.975587E+00 | loss scale: 32768.0 | grad norm: 141385.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1780/ 159576 | consumed samples: 28480 | elapsed time per iteration (ms): 13699.5 | learning rate: 7.895E-06 | global batch size: 16 | lm loss: 7.234455E+00 | loss scale: 32768.0 | grad norm: 167275.043 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1781/ 159576 | consumed samples: 28496 | elapsed time per iteration (ms): 13560.1 | learning rate: 7.899E-06 | global batch size: 16 | lm loss: 7.118816E+00 | loss scale: 32768.0 | grad norm: 185745.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1782/ 159576 | consumed samples: 28512 | elapsed time per iteration (ms): 14007.0 | learning rate: 7.904E-06 | global batch size: 16 | lm loss: 7.325441E+00 | loss scale: 32768.0 | grad norm: 151237.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1783/ 159576 | consumed samples: 28528 | elapsed time per iteration (ms): 13468.4 | learning rate: 7.908E-06 | global batch size: 16 | lm loss: 6.976577E+00 | loss scale: 32768.0 | grad norm: 157950.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1784/ 159576 | consumed samples: 28544 | elapsed time per iteration (ms): 13610.8 | learning rate: 7.913E-06 | global batch size: 16 | lm loss: 7.151215E+00 | loss scale: 32768.0 | grad norm: 185745.960 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1785/ 159576 | consumed samples: 28560 | elapsed time per iteration (ms): 13574.9 | learning rate: 7.917E-06 | global batch size: 16 | lm loss: 6.982706E+00 | loss scale: 32768.0 | grad norm: 212394.757 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1786/ 159576 | consumed samples: 28576 | elapsed time per iteration (ms): 13593.1 | learning rate: 7.922E-06 | global batch size: 16 | lm loss: 7.090255E+00 | loss scale: 32768.0 | grad norm: 165476.788 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1787/ 159576 | consumed samples: 28592 | elapsed time per iteration (ms): 13825.7 | learning rate: 7.926E-06 | global batch size: 16 | lm loss: 7.190539E+00 | loss scale: 32768.0 | grad norm: 105058.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1788/ 159576 | consumed samples: 28608 | elapsed time per iteration (ms): 13613.9 | learning rate: 7.930E-06 | global batch size: 16 | lm loss: 6.849520E+00 | loss scale: 32768.0 | grad norm: 180790.521 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1789/ 159576 | consumed samples: 28624 | elapsed time per iteration (ms): 13633.8 | learning rate: 7.935E-06 | global batch size: 16 | lm loss: 7.203046E+00 | loss scale: 32768.0 | grad norm: 126112.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1790/ 159576 | consumed samples: 28640 | elapsed time per iteration (ms): 13618.2 | learning rate: 7.939E-06 | global batch size: 16 | lm loss: 7.073618E+00 | loss scale: 32768.0 | grad norm: 138120.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1791/ 159576 | consumed samples: 28656 | elapsed time per iteration (ms): 14044.8 | learning rate: 7.944E-06 | global batch size: 16 | lm loss: 7.193256E+00 | loss scale: 32768.0 | grad norm: 127392.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1792/ 159576 | consumed samples: 28672 | elapsed time per iteration (ms): 13675.9 | learning rate: 7.948E-06 | global batch size: 16 | lm loss: 7.182660E+00 | loss scale: 32768.0 | grad norm: 128828.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1793/ 159576 | consumed samples: 28688 | elapsed time per iteration (ms): 13639.0 | learning rate: 7.953E-06 | global batch size: 16 | lm loss: 7.029709E+00 | loss scale: 32768.0 | grad norm: 123453.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1794/ 159576 | consumed samples: 28704 | elapsed time per iteration (ms): 13728.8 | learning rate: 7.957E-06 | global batch size: 16 | lm loss: 7.166730E+00 | loss scale: 32768.0 | grad norm: 117050.511 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1795/ 159576 | consumed samples: 28720 | elapsed time per iteration (ms): 13951.0 | learning rate: 7.962E-06 | global batch size: 16 | lm loss: 7.100776E+00 | loss scale: 32768.0 | grad norm: 166379.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1796/ 159576 | consumed samples: 28736 | elapsed time per iteration (ms): 13626.1 | learning rate: 7.966E-06 | global batch size: 16 | lm loss: 7.059687E+00 | loss scale: 32768.0 | grad norm: 165877.869 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1797/ 159576 | consumed samples: 28752 | elapsed time per iteration (ms): 13658.2 | learning rate: 7.970E-06 | global batch size: 16 | lm loss: 7.128800E+00 | loss scale: 32768.0 | grad norm: 241870.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1798/ 159576 | consumed samples: 28768 | elapsed time per iteration (ms): 13547.6 | learning rate: 7.975E-06 | global batch size: 16 | lm loss: 6.884446E+00 | loss scale: 32768.0 | grad norm: 129845.941 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1799/ 159576 | consumed samples: 28784 | elapsed time per iteration (ms): 13614.6 | learning rate: 7.979E-06 | global batch size: 16 | lm loss: 7.309677E+00 | loss scale: 32768.0 | grad norm: 156206.470 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1800/ 159576 | consumed samples: 28800 | elapsed time per iteration (ms): 13719.1 | learning rate: 7.984E-06 | global batch size: 16 | lm loss: 6.891129E+00 | loss scale: 32768.0 | grad norm: 130612.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1801/ 159576 | consumed samples: 28816 | elapsed time per iteration (ms): 13709.3 | learning rate: 7.988E-06 | global batch size: 16 | lm loss: 7.259354E+00 | loss scale: 32768.0 | grad norm: 299631.068 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1802/ 159576 | consumed samples: 28832 | elapsed time per iteration (ms): 13702.3 | learning rate: 7.993E-06 | global batch size: 16 | lm loss: 7.091782E+00 | loss scale: 32768.0 | grad norm: 164547.713 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1803/ 159576 | consumed samples: 28848 | elapsed time per iteration (ms): 13667.9 | learning rate: 7.997E-06 | global batch size: 16 | lm loss: 7.081347E+00 | loss scale: 32768.0 | grad norm: 157884.119 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1804/ 159576 | consumed samples: 28864 | elapsed time per iteration (ms): 14087.7 | learning rate: 8.001E-06 | global batch size: 16 | lm loss: 7.043708E+00 | loss scale: 32768.0 | grad norm: 179047.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1805/ 159576 | consumed samples: 28880 | elapsed time per iteration (ms): 13636.0 | learning rate: 8.006E-06 | global batch size: 16 | lm loss: 7.153672E+00 | loss scale: 32768.0 | grad norm: 171473.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1806/ 159576 | consumed samples: 28896 | elapsed time per iteration (ms): 13563.1 | learning rate: 8.010E-06 | global batch size: 16 | lm loss: 7.067021E+00 | loss scale: 32768.0 | grad norm: 114434.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1807/ 159576 | consumed samples: 28912 | elapsed time per iteration (ms): 13653.6 | learning rate: 8.015E-06 | global batch size: 16 | lm loss: 7.234491E+00 | loss scale: 32768.0 | grad norm: 149275.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1808/ 159576 | consumed samples: 28928 | elapsed time per iteration (ms): 13997.0 | learning rate: 8.019E-06 | global batch size: 16 | lm loss: 7.015783E+00 | loss scale: 32768.0 | grad norm: 179254.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1809/ 159576 | consumed samples: 28944 | elapsed time per iteration (ms): 13813.5 | learning rate: 8.024E-06 | global batch size: 16 | lm loss: 7.176732E+00 | loss scale: 32768.0 | grad norm: 180477.986 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1810/ 159576 | consumed samples: 28960 | elapsed time per iteration (ms): 13672.4 | learning rate: 8.028E-06 | global batch size: 16 | lm loss: 6.590204E+00 | loss scale: 32768.0 | grad norm: 149127.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1811/ 159576 | consumed samples: 28976 | elapsed time per iteration (ms): 13741.3 | learning rate: 8.033E-06 | global batch size: 16 | lm loss: 7.100949E+00 | loss scale: 32768.0 | grad norm: 133004.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1812/ 159576 | consumed samples: 28992 | elapsed time per iteration (ms): 13598.0 | learning rate: 8.037E-06 | global batch size: 16 | lm loss: 7.268322E+00 | loss scale: 32768.0 | grad norm: 287887.492 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1813/ 159576 | consumed samples: 29008 | elapsed time per iteration (ms): 13826.0 | learning rate: 8.041E-06 | global batch size: 16 | lm loss: 7.048282E+00 | loss scale: 32768.0 | grad norm: 147045.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1814/ 159576 | consumed samples: 29024 | elapsed time per iteration (ms): 13651.5 | learning rate: 8.046E-06 | global batch size: 16 | lm loss: 7.168237E+00 | loss scale: 32768.0 | grad norm: 167345.880 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1815/ 159576 | consumed samples: 29040 | elapsed time per iteration (ms): 13646.2 | learning rate: 8.050E-06 | global batch size: 16 | lm loss: 6.976926E+00 | loss scale: 32768.0 | grad norm: 173193.629 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1816/ 159576 | consumed samples: 29056 | elapsed time per iteration (ms): 13708.4 | learning rate: 8.055E-06 | global batch size: 16 | lm loss: 7.173286E+00 | loss scale: 32768.0 | grad norm: 156812.836 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1817/ 159576 | consumed samples: 29072 | elapsed time per iteration (ms): 14056.6 | learning rate: 8.059E-06 | global batch size: 16 | lm loss: 7.191895E+00 | loss scale: 32768.0 | grad norm: 254989.804 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1818/ 159576 | consumed samples: 29088 | elapsed time per iteration (ms): 13727.1 | learning rate: 8.064E-06 | global batch size: 16 | lm loss: 7.070405E+00 | loss scale: 32768.0 | grad norm: 128138.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1819/ 159576 | consumed samples: 29104 | elapsed time per iteration (ms): 13606.2 | learning rate: 8.068E-06 | global batch size: 16 | lm loss: 6.955974E+00 | loss scale: 32768.0 | grad norm: 140247.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1820/ 159576 | consumed samples: 29120 | elapsed time per iteration (ms): 13652.5 | learning rate: 8.072E-06 | global batch size: 16 | lm loss: 7.029711E+00 | loss scale: 32768.0 | grad norm: 153040.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1821/ 159576 | consumed samples: 29136 | elapsed time per iteration (ms): 13671.5 | learning rate: 8.077E-06 | global batch size: 16 | lm loss: 7.097312E+00 | loss scale: 32768.0 | grad norm: 168364.904 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1822/ 159576 | consumed samples: 29152 | elapsed time per iteration (ms): 13964.1 | learning rate: 8.081E-06 | global batch size: 16 | lm loss: 7.163728E+00 | loss scale: 32768.0 | grad norm: 143592.573 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1823/ 159576 | consumed samples: 29168 | elapsed time per iteration (ms): 13677.5 | learning rate: 8.086E-06 | global batch size: 16 | lm loss: 7.161910E+00 | loss scale: 32768.0 | grad norm: 232336.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1824/ 159576 | consumed samples: 29184 | elapsed time per iteration (ms): 13682.4 | learning rate: 8.090E-06 | global batch size: 16 | lm loss: 7.241871E+00 | loss scale: 32768.0 | grad norm: 136988.706 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1825/ 159576 | consumed samples: 29200 | elapsed time per iteration (ms): 13681.2 | learning rate: 8.095E-06 | global batch size: 16 | lm loss: 6.885506E+00 | loss scale: 32768.0 | grad norm: 147212.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1826/ 159576 | consumed samples: 29216 | elapsed time per iteration (ms): 14107.7 | learning rate: 8.099E-06 | global batch size: 16 | lm loss: 7.094235E+00 | loss scale: 32768.0 | grad norm: 210358.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1827/ 159576 | consumed samples: 29232 | elapsed time per iteration (ms): 13698.2 | learning rate: 8.104E-06 | global batch size: 16 | lm loss: 6.987474E+00 | loss scale: 32768.0 | grad norm: 200444.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1828/ 159576 | consumed samples: 29248 | elapsed time per iteration (ms): 13646.3 | learning rate: 8.108E-06 | global batch size: 16 | lm loss: 7.024292E+00 | loss scale: 32768.0 | grad norm: 144708.093 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1829/ 159576 | consumed samples: 29264 | elapsed time per iteration (ms): 13672.0 | learning rate: 8.112E-06 | global batch size: 16 | lm loss: 7.101940E+00 | loss scale: 32768.0 | grad norm: 137983.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1830/ 159576 | consumed samples: 29280 | elapsed time per iteration (ms): 13973.1 | learning rate: 8.117E-06 | global batch size: 16 | lm loss: 6.950300E+00 | loss scale: 32768.0 | grad norm: 228570.073 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1831/ 159576 | consumed samples: 29296 | elapsed time per iteration (ms): 13712.1 | learning rate: 8.121E-06 | global batch size: 16 | lm loss: 7.000825E+00 | loss scale: 32768.0 | grad norm: 204009.839 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1832/ 159576 | consumed samples: 29312 | elapsed time per iteration (ms): 13734.6 | learning rate: 8.126E-06 | global batch size: 16 | lm loss: 7.021888E+00 | loss scale: 32768.0 | grad norm: 168698.722 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1833/ 159576 | consumed samples: 29328 | elapsed time per iteration (ms): 13643.1 | learning rate: 8.130E-06 | global batch size: 16 | lm loss: 6.956877E+00 | loss scale: 32768.0 | grad norm: 139702.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1834/ 159576 | consumed samples: 29344 | elapsed time per iteration (ms): 13670.0 | learning rate: 8.135E-06 | global batch size: 16 | lm loss: 7.078534E+00 | loss scale: 32768.0 | grad norm: 220188.892 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1835/ 159576 | consumed samples: 29360 | elapsed time per iteration (ms): 13786.5 | learning rate: 8.139E-06 | global batch size: 16 | lm loss: 7.145173E+00 | loss scale: 32768.0 | grad norm: 181620.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1836/ 159576 | consumed samples: 29376 | elapsed time per iteration (ms): 13684.7 | learning rate: 8.143E-06 | global batch size: 16 | lm loss: 7.147571E+00 | loss scale: 32768.0 | grad norm: 148241.508 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1837/ 159576 | consumed samples: 29392 | elapsed time per iteration (ms): 13650.8 | learning rate: 8.148E-06 | global batch size: 16 | lm loss: 7.198610E+00 | loss scale: 32768.0 | grad norm: 129198.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1838/ 159576 | consumed samples: 29408 | elapsed time per iteration (ms): 13689.6 | learning rate: 8.152E-06 | global batch size: 16 | lm loss: 7.077027E+00 | loss scale: 32768.0 | grad norm: 179805.881 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1839/ 159576 | consumed samples: 29424 | elapsed time per iteration (ms): 14193.0 | learning rate: 8.157E-06 | global batch size: 16 | lm loss: 7.034157E+00 | loss scale: 32768.0 | grad norm: 179474.021 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1840/ 159576 | consumed samples: 29440 | elapsed time per iteration (ms): 13593.3 | learning rate: 8.161E-06 | global batch size: 16 | lm loss: 7.132106E+00 | loss scale: 32768.0 | grad norm: 138966.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1841/ 159576 | consumed samples: 29456 | elapsed time per iteration (ms): 13717.8 | learning rate: 8.166E-06 | global batch size: 16 | lm loss: 7.290091E+00 | loss scale: 32768.0 | grad norm: 176321.035 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1842/ 159576 | consumed samples: 29472 | elapsed time per iteration (ms): 13672.3 | learning rate: 8.170E-06 | global batch size: 16 | lm loss: 7.222583E+00 | loss scale: 32768.0 | grad norm: 157190.685 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1843/ 159576 | consumed samples: 29488 | elapsed time per iteration (ms): 14041.0 | learning rate: 8.175E-06 | global batch size: 16 | lm loss: 7.080160E+00 | loss scale: 32768.0 | grad norm: 209951.002 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1844/ 159576 | consumed samples: 29504 | elapsed time per iteration (ms): 13687.6 | learning rate: 8.179E-06 | global batch size: 16 | lm loss: 7.044501E+00 | loss scale: 32768.0 | grad norm: 148871.965 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1845/ 159576 | consumed samples: 29520 | elapsed time per iteration (ms): 13645.6 | learning rate: 8.183E-06 | global batch size: 16 | lm loss: 7.157808E+00 | loss scale: 32768.0 | grad norm: 274735.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1846/ 159576 | consumed samples: 29536 | elapsed time per iteration (ms): 13730.4 | learning rate: 8.188E-06 | global batch size: 16 | lm loss: 6.885038E+00 | loss scale: 32768.0 | grad norm: 152141.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1847/ 159576 | consumed samples: 29552 | elapsed time per iteration (ms): 13619.7 | learning rate: 8.192E-06 | global batch size: 16 | lm loss: 7.235194E+00 | loss scale: 32768.0 | grad norm: 176093.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1848/ 159576 | consumed samples: 29568 | elapsed time per iteration (ms): 13886.2 | learning rate: 8.197E-06 | global batch size: 16 | lm loss: 7.254928E+00 | loss scale: 32768.0 | grad norm: 205754.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1849/ 159576 | consumed samples: 29584 | elapsed time per iteration (ms): 13743.9 | learning rate: 8.201E-06 | global batch size: 16 | lm loss: 7.040710E+00 | loss scale: 32768.0 | grad norm: 218799.146 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1850/ 159576 | consumed samples: 29600 | elapsed time per iteration (ms): 13589.2 | learning rate: 8.206E-06 | global batch size: 16 | lm loss: 7.048983E+00 | loss scale: 32768.0 | grad norm: 207680.104 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1851/ 159576 | consumed samples: 29616 | elapsed time per iteration (ms): 13643.5 | learning rate: 8.210E-06 | global batch size: 16 | lm loss: 7.264068E+00 | loss scale: 32768.0 | grad norm: 172145.935 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1852/ 159576 | consumed samples: 29632 | elapsed time per iteration (ms): 14007.8 | learning rate: 8.214E-06 | global batch size: 16 | lm loss: 7.091225E+00 | loss scale: 32768.0 | grad norm: 165885.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1853/ 159576 | consumed samples: 29648 | elapsed time per iteration (ms): 13621.7 | learning rate: 8.219E-06 | global batch size: 16 | lm loss: 7.004953E+00 | loss scale: 32768.0 | grad norm: 193763.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1854/ 159576 | consumed samples: 29664 | elapsed time per iteration (ms): 13705.7 | learning rate: 8.223E-06 | global batch size: 16 | lm loss: 7.337306E+00 | loss scale: 32768.0 | grad norm: 334165.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1855/ 159576 | consumed samples: 29680 | elapsed time per iteration (ms): 13688.7 | learning rate: 8.228E-06 | global batch size: 16 | lm loss: 7.088278E+00 | loss scale: 32768.0 | grad norm: 168305.003 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1856/ 159576 | consumed samples: 29696 | elapsed time per iteration (ms): 14064.4 | learning rate: 8.232E-06 | global batch size: 16 | lm loss: 7.075657E+00 | loss scale: 32768.0 | grad norm: 146104.687 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1857/ 159576 | consumed samples: 29712 | elapsed time per iteration (ms): 13622.8 | learning rate: 8.237E-06 | global batch size: 16 | lm loss: 7.326543E+00 | loss scale: 32768.0 | grad norm: 226986.122 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1858/ 159576 | consumed samples: 29728 | elapsed time per iteration (ms): 13661.1 | learning rate: 8.241E-06 | global batch size: 16 | lm loss: 7.226311E+00 | loss scale: 32768.0 | grad norm: 127252.080 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1859/ 159576 | consumed samples: 29744 | elapsed time per iteration (ms): 13672.4 | learning rate: 8.246E-06 | global batch size: 16 | lm loss: 7.024733E+00 | loss scale: 32768.0 | grad norm: 195136.100 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1860/ 159576 | consumed samples: 29760 | elapsed time per iteration (ms): 13685.6 | learning rate: 8.250E-06 | global batch size: 16 | lm loss: 7.050764E+00 | loss scale: 32768.0 | grad norm: 137697.941 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1861/ 159576 | consumed samples: 29776 | elapsed time per iteration (ms): 13956.5 | learning rate: 8.254E-06 | global batch size: 16 | lm loss: 7.164598E+00 | loss scale: 32768.0 | grad norm: 186285.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1862/ 159576 | consumed samples: 29792 | elapsed time per iteration (ms): 13801.6 | learning rate: 8.259E-06 | global batch size: 16 | lm loss: 6.982927E+00 | loss scale: 32768.0 | grad norm: 155576.175 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1863/ 159576 | consumed samples: 29808 | elapsed time per iteration (ms): 13779.0 | learning rate: 8.263E-06 | global batch size: 16 | lm loss: 6.845668E+00 | loss scale: 32768.0 | grad norm: 211290.875 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1864/ 159576 | consumed samples: 29824 | elapsed time per iteration (ms): 13629.6 | learning rate: 8.268E-06 | global batch size: 16 | lm loss: 7.561100E+00 | loss scale: 32768.0 | grad norm: 177907.854 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1865/ 159576 | consumed samples: 29840 | elapsed time per iteration (ms): 14024.6 | learning rate: 8.272E-06 | global batch size: 16 | lm loss: 7.056180E+00 | loss scale: 32768.0 | grad norm: 132307.729 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1866/ 159576 | consumed samples: 29856 | elapsed time per iteration (ms): 13629.1 | learning rate: 8.277E-06 | global batch size: 16 | lm loss: 7.005206E+00 | loss scale: 32768.0 | grad norm: 140727.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1867/ 159576 | consumed samples: 29872 | elapsed time per iteration (ms): 13680.5 | learning rate: 8.281E-06 | global batch size: 16 | lm loss: 7.008940E+00 | loss scale: 32768.0 | grad norm: 149676.751 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1868/ 159576 | consumed samples: 29888 | elapsed time per iteration (ms): 13661.9 | learning rate: 8.286E-06 | global batch size: 16 | lm loss: 7.154263E+00 | loss scale: 32768.0 | grad norm: 181537.747 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1869/ 159576 | consumed samples: 29904 | elapsed time per iteration (ms): 13705.9 | learning rate: 8.290E-06 | global batch size: 16 | lm loss: 7.144859E+00 | loss scale: 32768.0 | grad norm: 156740.122 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1870/ 159576 | consumed samples: 29920 | elapsed time per iteration (ms): 13994.0 | learning rate: 8.294E-06 | global batch size: 16 | lm loss: 7.053184E+00 | loss scale: 32768.0 | grad norm: 209836.615 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1871/ 159576 | consumed samples: 29936 | elapsed time per iteration (ms): 13623.9 | learning rate: 8.299E-06 | global batch size: 16 | lm loss: 7.033763E+00 | loss scale: 32768.0 | grad norm: 173327.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1872/ 159576 | consumed samples: 29952 | elapsed time per iteration (ms): 13679.1 | learning rate: 8.303E-06 | global batch size: 16 | lm loss: 6.990786E+00 | loss scale: 32768.0 | grad norm: 281336.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1873/ 159576 | consumed samples: 29968 | elapsed time per iteration (ms): 13694.2 | learning rate: 8.308E-06 | global batch size: 16 | lm loss: 7.073781E+00 | loss scale: 32768.0 | grad norm: 124900.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1874/ 159576 | consumed samples: 29984 | elapsed time per iteration (ms): 13905.9 | learning rate: 8.312E-06 | global batch size: 16 | lm loss: 7.112270E+00 | loss scale: 32768.0 | grad norm: 168221.159 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1875/ 159576 | consumed samples: 30000 | elapsed time per iteration (ms): 13703.7 | learning rate: 8.317E-06 | global batch size: 16 | lm loss: 7.233196E+00 | loss scale: 32768.0 | grad norm: 174650.162 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1876/ 159576 | consumed samples: 30016 | elapsed time per iteration (ms): 13702.9 | learning rate: 8.321E-06 | global batch size: 16 | lm loss: 6.967190E+00 | loss scale: 32768.0 | grad norm: 177533.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1877/ 159576 | consumed samples: 30032 | elapsed time per iteration (ms): 13717.8 | learning rate: 8.325E-06 | global batch size: 16 | lm loss: 7.208225E+00 | loss scale: 32768.0 | grad norm: 207887.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1878/ 159576 | consumed samples: 30048 | elapsed time per iteration (ms): 14066.9 | learning rate: 8.330E-06 | global batch size: 16 | lm loss: 7.077339E+00 | loss scale: 32768.0 | grad norm: 142338.907 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1879/ 159576 | consumed samples: 30064 | elapsed time per iteration (ms): 13776.6 | learning rate: 8.334E-06 | global batch size: 16 | lm loss: 7.113251E+00 | loss scale: 32768.0 | grad norm: 158300.777 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1880/ 159576 | consumed samples: 30080 | elapsed time per iteration (ms): 13663.2 | learning rate: 8.339E-06 | global batch size: 16 | lm loss: 6.912469E+00 | loss scale: 32768.0 | grad norm: 145353.873 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1881/ 159576 | consumed samples: 30096 | elapsed time per iteration (ms): 13679.1 | learning rate: 8.343E-06 | global batch size: 16 | lm loss: 7.055939E+00 | loss scale: 32768.0 | grad norm: 337973.880 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1882/ 159576 | consumed samples: 30112 | elapsed time per iteration (ms): 13654.4 | learning rate: 8.348E-06 | global batch size: 16 | lm loss: 6.903512E+00 | loss scale: 32768.0 | grad norm: 240165.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1883/ 159576 | consumed samples: 30128 | elapsed time per iteration (ms): 13896.8 | learning rate: 8.352E-06 | global batch size: 16 | lm loss: 7.154733E+00 | loss scale: 32768.0 | grad norm: 145006.968 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1884/ 159576 | consumed samples: 30144 | elapsed time per iteration (ms): 13729.5 | learning rate: 8.357E-06 | global batch size: 16 | lm loss: 7.018287E+00 | loss scale: 32768.0 | grad norm: 447058.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1885/ 159576 | consumed samples: 30160 | elapsed time per iteration (ms): 13624.7 | learning rate: 8.361E-06 | global batch size: 16 | lm loss: 7.306771E+00 | loss scale: 32768.0 | grad norm: 269279.686 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1886/ 159576 | consumed samples: 30176 | elapsed time per iteration (ms): 13710.2 | learning rate: 8.365E-06 | global batch size: 16 | lm loss: 7.124641E+00 | loss scale: 32768.0 | grad norm: 184189.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1887/ 159576 | consumed samples: 30192 | elapsed time per iteration (ms): 14269.7 | learning rate: 8.370E-06 | global batch size: 16 | lm loss: 7.147641E+00 | loss scale: 32768.0 | grad norm: 240777.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1888/ 159576 | consumed samples: 30208 | elapsed time per iteration (ms): 13668.8 | learning rate: 8.374E-06 | global batch size: 16 | lm loss: 7.246544E+00 | loss scale: 32768.0 | grad norm: 221768.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1889/ 159576 | consumed samples: 30224 | elapsed time per iteration (ms): 13682.0 | learning rate: 8.379E-06 | global batch size: 16 | lm loss: 7.042133E+00 | loss scale: 32768.0 | grad norm: 453492.686 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1890/ 159576 | consumed samples: 30240 | elapsed time per iteration (ms): 13683.0 | learning rate: 8.383E-06 | global batch size: 16 | lm loss: 7.161106E+00 | loss scale: 32768.0 | grad norm: 191134.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1891/ 159576 | consumed samples: 30256 | elapsed time per iteration (ms): 14045.3 | learning rate: 8.388E-06 | global batch size: 16 | lm loss: 7.080533E+00 | loss scale: 32768.0 | grad norm: 226207.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1892/ 159576 | consumed samples: 30272 | elapsed time per iteration (ms): 13740.4 | learning rate: 8.392E-06 | global batch size: 16 | lm loss: 6.948812E+00 | loss scale: 32768.0 | grad norm: 198329.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1893/ 159576 | consumed samples: 30288 | elapsed time per iteration (ms): 13747.4 | learning rate: 8.396E-06 | global batch size: 16 | lm loss: 7.024124E+00 | loss scale: 32768.0 | grad norm: 332574.173 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1894/ 159576 | consumed samples: 30304 | elapsed time per iteration (ms): 13742.5 | learning rate: 8.401E-06 | global batch size: 16 | lm loss: 7.072248E+00 | loss scale: 32768.0 | grad norm: 351090.950 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1895/ 159576 | consumed samples: 30320 | elapsed time per iteration (ms): 13599.9 | learning rate: 8.405E-06 | global batch size: 16 | lm loss: 6.964484E+00 | loss scale: 32768.0 | grad norm: 180676.580 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1896/ 159576 | consumed samples: 30336 | elapsed time per iteration (ms): 13892.1 | learning rate: 8.410E-06 | global batch size: 16 | lm loss: 7.066601E+00 | loss scale: 32768.0 | grad norm: 186229.787 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1897/ 159576 | consumed samples: 30352 | elapsed time per iteration (ms): 13686.6 | learning rate: 8.414E-06 | global batch size: 16 | lm loss: 6.975677E+00 | loss scale: 32768.0 | grad norm: 145844.159 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1898/ 159576 | consumed samples: 30368 | elapsed time per iteration (ms): 13668.1 | learning rate: 8.419E-06 | global batch size: 16 | lm loss: 7.225606E+00 | loss scale: 32768.0 | grad norm: 229819.640 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1899/ 159576 | consumed samples: 30384 | elapsed time per iteration (ms): 13600.0 | learning rate: 8.423E-06 | global batch size: 16 | lm loss: 7.082514E+00 | loss scale: 32768.0 | grad norm: 185081.109 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1900/ 159576 | consumed samples: 30400 | elapsed time per iteration (ms): 14001.2 | learning rate: 8.428E-06 | global batch size: 16 | lm loss: 7.021253E+00 | loss scale: 32768.0 | grad norm: 220377.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1901/ 159576 | consumed samples: 30416 | elapsed time per iteration (ms): 13722.2 | learning rate: 8.432E-06 | global batch size: 16 | lm loss: 7.049896E+00 | loss scale: 32768.0 | grad norm: 166889.016 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1902/ 159576 | consumed samples: 30432 | elapsed time per iteration (ms): 13621.3 | learning rate: 8.436E-06 | global batch size: 16 | lm loss: 6.878879E+00 | loss scale: 32768.0 | grad norm: 145213.866 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1903/ 159576 | consumed samples: 30448 | elapsed time per iteration (ms): 13693.3 | learning rate: 8.441E-06 | global batch size: 16 | lm loss: 6.981446E+00 | loss scale: 32768.0 | grad norm: 385714.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1904/ 159576 | consumed samples: 30464 | elapsed time per iteration (ms): 13924.8 | learning rate: 8.445E-06 | global batch size: 16 | lm loss: 7.065192E+00 | loss scale: 32768.0 | grad norm: 230309.474 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1905/ 159576 | consumed samples: 30480 | elapsed time per iteration (ms): 13762.9 | learning rate: 8.450E-06 | global batch size: 16 | lm loss: 7.016763E+00 | loss scale: 32768.0 | grad norm: 164701.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1906/ 159576 | consumed samples: 30496 | elapsed time per iteration (ms): 13644.6 | learning rate: 8.454E-06 | global batch size: 16 | lm loss: 6.935023E+00 | loss scale: 32768.0 | grad norm: 158636.532 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1907/ 159576 | consumed samples: 30512 | elapsed time per iteration (ms): 13659.2 | learning rate: 8.459E-06 | global batch size: 16 | lm loss: 7.008549E+00 | loss scale: 32768.0 | grad norm: 216415.844 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1908/ 159576 | consumed samples: 30528 | elapsed time per iteration (ms): 13777.8 | learning rate: 8.463E-06 | global batch size: 16 | lm loss: 7.210999E+00 | loss scale: 32768.0 | grad norm: 201609.115 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1909/ 159576 | consumed samples: 30544 | elapsed time per iteration (ms): 13647.1 | learning rate: 8.467E-06 | global batch size: 16 | lm loss: 7.035434E+00 | loss scale: 32768.0 | grad norm: 157381.108 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1910/ 159576 | consumed samples: 30560 | elapsed time per iteration (ms): 13657.7 | learning rate: 8.472E-06 | global batch size: 16 | lm loss: 7.002993E+00 | loss scale: 32768.0 | grad norm: 137094.187 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1911/ 159576 | consumed samples: 30576 | elapsed time per iteration (ms): 13538.8 | learning rate: 8.476E-06 | global batch size: 16 | lm loss: 6.895042E+00 | loss scale: 32768.0 | grad norm: 201565.995 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1912/ 159576 | consumed samples: 30592 | elapsed time per iteration (ms): 13570.4 | learning rate: 8.481E-06 | global batch size: 16 | lm loss: 7.119932E+00 | loss scale: 32768.0 | grad norm: 191020.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1913/ 159576 | consumed samples: 30608 | elapsed time per iteration (ms): 13960.8 | learning rate: 8.485E-06 | global batch size: 16 | lm loss: 7.021863E+00 | loss scale: 32768.0 | grad norm: 163947.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1914/ 159576 | consumed samples: 30624 | elapsed time per iteration (ms): 13571.3 | learning rate: 8.490E-06 | global batch size: 16 | lm loss: 7.255896E+00 | loss scale: 32768.0 | grad norm: 110811.833 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1915/ 159576 | consumed samples: 30640 | elapsed time per iteration (ms): 13592.9 | learning rate: 8.494E-06 | global batch size: 16 | lm loss: 7.058972E+00 | loss scale: 32768.0 | grad norm: 226666.177 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1916/ 159576 | consumed samples: 30656 | elapsed time per iteration (ms): 13559.3 | learning rate: 8.499E-06 | global batch size: 16 | lm loss: 7.001413E+00 | loss scale: 32768.0 | grad norm: 155562.702 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1917/ 159576 | consumed samples: 30672 | elapsed time per iteration (ms): 13603.1 | learning rate: 8.503E-06 | global batch size: 16 | lm loss: 6.925358E+00 | loss scale: 32768.0 | grad norm: 153599.875 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1918/ 159576 | consumed samples: 30688 | elapsed time per iteration (ms): 13848.6 | learning rate: 8.507E-06 | global batch size: 16 | lm loss: 7.013722E+00 | loss scale: 32768.0 | grad norm: 151847.788 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1919/ 159576 | consumed samples: 30704 | elapsed time per iteration (ms): 13580.7 | learning rate: 8.512E-06 | global batch size: 16 | lm loss: 7.057837E+00 | loss scale: 32768.0 | grad norm: 149268.841 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1920/ 159576 | consumed samples: 30720 | elapsed time per iteration (ms): 13579.6 | learning rate: 8.516E-06 | global batch size: 16 | lm loss: 7.059657E+00 | loss scale: 32768.0 | grad norm: 211843.149 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1921/ 159576 | consumed samples: 30736 | elapsed time per iteration (ms): 13716.2 | learning rate: 8.521E-06 | global batch size: 16 | lm loss: 7.145122E+00 | loss scale: 32768.0 | grad norm: 158831.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1922/ 159576 | consumed samples: 30752 | elapsed time per iteration (ms): 14204.8 | learning rate: 8.525E-06 | global batch size: 16 | lm loss: 7.012016E+00 | loss scale: 32768.0 | grad norm: 142219.675 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1923/ 159576 | consumed samples: 30768 | elapsed time per iteration (ms): 13586.3 | learning rate: 8.530E-06 | global batch size: 16 | lm loss: 6.958722E+00 | loss scale: 32768.0 | grad norm: 147958.053 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1924/ 159576 | consumed samples: 30784 | elapsed time per iteration (ms): 13654.4 | learning rate: 8.534E-06 | global batch size: 16 | lm loss: 6.916204E+00 | loss scale: 32768.0 | grad norm: 168316.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1925/ 159576 | consumed samples: 30800 | elapsed time per iteration (ms): 13581.4 | learning rate: 8.538E-06 | global batch size: 16 | lm loss: 7.208139E+00 | loss scale: 32768.0 | grad norm: 186895.870 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1926/ 159576 | consumed samples: 30816 | elapsed time per iteration (ms): 14057.7 | learning rate: 8.543E-06 | global batch size: 16 | lm loss: 6.921901E+00 | loss scale: 32768.0 | grad norm: 136886.936 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1927/ 159576 | consumed samples: 30832 | elapsed time per iteration (ms): 13553.3 | learning rate: 8.547E-06 | global batch size: 16 | lm loss: 7.044703E+00 | loss scale: 32768.0 | grad norm: 318519.845 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1928/ 159576 | consumed samples: 30848 | elapsed time per iteration (ms): 13594.1 | learning rate: 8.552E-06 | global batch size: 16 | lm loss: 6.906800E+00 | loss scale: 32768.0 | grad norm: 155021.065 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1929/ 159576 | consumed samples: 30864 | elapsed time per iteration (ms): 13607.1 | learning rate: 8.556E-06 | global batch size: 16 | lm loss: 6.881465E+00 | loss scale: 32768.0 | grad norm: 190717.011 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1930/ 159576 | consumed samples: 30880 | elapsed time per iteration (ms): 13551.6 | learning rate: 8.561E-06 | global batch size: 16 | lm loss: 7.199529E+00 | loss scale: 32768.0 | grad norm: 191859.870 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1931/ 159576 | consumed samples: 30896 | elapsed time per iteration (ms): 13806.2 | learning rate: 8.565E-06 | global batch size: 16 | lm loss: 6.954100E+00 | loss scale: 32768.0 | grad norm: 130775.699 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1932/ 159576 | consumed samples: 30912 | elapsed time per iteration (ms): 13613.1 | learning rate: 8.570E-06 | global batch size: 16 | lm loss: 6.704428E+00 | loss scale: 32768.0 | grad norm: 137607.979 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1933/ 159576 | consumed samples: 30928 | elapsed time per iteration (ms): 13506.4 | learning rate: 8.574E-06 | global batch size: 16 | lm loss: 7.014212E+00 | loss scale: 32768.0 | grad norm: 186579.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1934/ 159576 | consumed samples: 30944 | elapsed time per iteration (ms): 13520.6 | learning rate: 8.578E-06 | global batch size: 16 | lm loss: 7.012688E+00 | loss scale: 32768.0 | grad norm: 155464.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1935/ 159576 | consumed samples: 30960 | elapsed time per iteration (ms): 13855.4 | learning rate: 8.583E-06 | global batch size: 16 | lm loss: 7.011374E+00 | loss scale: 32768.0 | grad norm: 128570.064 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1936/ 159576 | consumed samples: 30976 | elapsed time per iteration (ms): 13483.8 | learning rate: 8.587E-06 | global batch size: 16 | lm loss: 6.823971E+00 | loss scale: 32768.0 | grad norm: 185286.139 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1937/ 159576 | consumed samples: 30992 | elapsed time per iteration (ms): 13455.5 | learning rate: 8.592E-06 | global batch size: 16 | lm loss: 7.002713E+00 | loss scale: 32768.0 | grad norm: 168834.944 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1938/ 159576 | consumed samples: 31008 | elapsed time per iteration (ms): 13488.7 | learning rate: 8.596E-06 | global batch size: 16 | lm loss: 7.308265E+00 | loss scale: 32768.0 | grad norm: 113334.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1939/ 159576 | consumed samples: 31024 | elapsed time per iteration (ms): 13517.8 | learning rate: 8.601E-06 | global batch size: 16 | lm loss: 6.832065E+00 | loss scale: 32768.0 | grad norm: 143617.951 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1940/ 159576 | consumed samples: 31040 | elapsed time per iteration (ms): 13777.8 | learning rate: 8.605E-06 | global batch size: 16 | lm loss: 6.758460E+00 | loss scale: 32768.0 | grad norm: 131000.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1941/ 159576 | consumed samples: 31056 | elapsed time per iteration (ms): 13526.9 | learning rate: 8.609E-06 | global batch size: 16 | lm loss: 6.587332E+00 | loss scale: 32768.0 | grad norm: 133270.011 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1942/ 159576 | consumed samples: 31072 | elapsed time per iteration (ms): 13522.3 | learning rate: 8.614E-06 | global batch size: 16 | lm loss: 7.005889E+00 | loss scale: 32768.0 | grad norm: 169934.736 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1943/ 159576 | consumed samples: 31088 | elapsed time per iteration (ms): 13505.7 | learning rate: 8.618E-06 | global batch size: 16 | lm loss: 7.113358E+00 | loss scale: 32768.0 | grad norm: 147469.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1944/ 159576 | consumed samples: 31104 | elapsed time per iteration (ms): 14004.8 | learning rate: 8.623E-06 | global batch size: 16 | lm loss: 6.815184E+00 | loss scale: 32768.0 | grad norm: 129420.793 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1945/ 159576 | consumed samples: 31120 | elapsed time per iteration (ms): 13536.0 | learning rate: 8.627E-06 | global batch size: 16 | lm loss: 6.802580E+00 | loss scale: 32768.0 | grad norm: 206454.023 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1946/ 159576 | consumed samples: 31136 | elapsed time per iteration (ms): 13571.2 | learning rate: 8.632E-06 | global batch size: 16 | lm loss: 6.899452E+00 | loss scale: 32768.0 | grad norm: 159625.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1947/ 159576 | consumed samples: 31152 | elapsed time per iteration (ms): 13512.7 | learning rate: 8.636E-06 | global batch size: 16 | lm loss: 6.902468E+00 | loss scale: 32768.0 | grad norm: 161374.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1948/ 159576 | consumed samples: 31168 | elapsed time per iteration (ms): 13965.3 | learning rate: 8.641E-06 | global batch size: 16 | lm loss: 7.027518E+00 | loss scale: 32768.0 | grad norm: 141898.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1949/ 159576 | consumed samples: 31184 | elapsed time per iteration (ms): 13617.6 | learning rate: 8.645E-06 | global batch size: 16 | lm loss: 6.901030E+00 | loss scale: 32768.0 | grad norm: 115156.669 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1950/ 159576 | consumed samples: 31200 | elapsed time per iteration (ms): 13549.7 | learning rate: 8.649E-06 | global batch size: 16 | lm loss: 7.012411E+00 | loss scale: 32768.0 | grad norm: 364327.043 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1951/ 159576 | consumed samples: 31216 | elapsed time per iteration (ms): 13460.7 | learning rate: 8.654E-06 | global batch size: 16 | lm loss: 6.996010E+00 | loss scale: 32768.0 | grad norm: 265923.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1952/ 159576 | consumed samples: 31232 | elapsed time per iteration (ms): 13574.9 | learning rate: 8.658E-06 | global batch size: 16 | lm loss: 7.002955E+00 | loss scale: 32768.0 | grad norm: 147080.962 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1953/ 159576 | consumed samples: 31248 | elapsed time per iteration (ms): 13782.5 | learning rate: 8.663E-06 | global batch size: 16 | lm loss: 6.930263E+00 | loss scale: 32768.0 | grad norm: 190217.592 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1954/ 159576 | consumed samples: 31264 | elapsed time per iteration (ms): 13515.2 | learning rate: 8.667E-06 | global batch size: 16 | lm loss: 6.835277E+00 | loss scale: 32768.0 | grad norm: 254678.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1955/ 159576 | consumed samples: 31280 | elapsed time per iteration (ms): 13569.3 | learning rate: 8.672E-06 | global batch size: 16 | lm loss: 7.283230E+00 | loss scale: 32768.0 | grad norm: 137167.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1956/ 159576 | consumed samples: 31296 | elapsed time per iteration (ms): 13592.0 | learning rate: 8.676E-06 | global batch size: 16 | lm loss: 6.895840E+00 | loss scale: 32768.0 | grad norm: 198657.902 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1957/ 159576 | consumed samples: 31312 | elapsed time per iteration (ms): 13906.4 | learning rate: 8.680E-06 | global batch size: 16 | lm loss: 7.127283E+00 | loss scale: 32768.0 | grad norm: 242163.922 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1958/ 159576 | consumed samples: 31328 | elapsed time per iteration (ms): 13647.9 | learning rate: 8.685E-06 | global batch size: 16 | lm loss: 7.022318E+00 | loss scale: 32768.0 | grad norm: 179227.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1959/ 159576 | consumed samples: 31344 | elapsed time per iteration (ms): 13668.0 | learning rate: 8.689E-06 | global batch size: 16 | lm loss: 7.021772E+00 | loss scale: 32768.0 | grad norm: 223437.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1960/ 159576 | consumed samples: 31360 | elapsed time per iteration (ms): 13699.2 | learning rate: 8.694E-06 | global batch size: 16 | lm loss: 7.270517E+00 | loss scale: 32768.0 | grad norm: 166965.849 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1961/ 159576 | consumed samples: 31376 | elapsed time per iteration (ms): 13595.5 | learning rate: 8.698E-06 | global batch size: 16 | lm loss: 6.963766E+00 | loss scale: 32768.0 | grad norm: 257581.689 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1962/ 159576 | consumed samples: 31392 | elapsed time per iteration (ms): 13818.3 | learning rate: 8.703E-06 | global batch size: 16 | lm loss: 6.847409E+00 | loss scale: 32768.0 | grad norm: 162709.033 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1963/ 159576 | consumed samples: 31408 | elapsed time per iteration (ms): 13645.3 | learning rate: 8.707E-06 | global batch size: 16 | lm loss: 6.902783E+00 | loss scale: 32768.0 | grad norm: 186486.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1964/ 159576 | consumed samples: 31424 | elapsed time per iteration (ms): 13637.0 | learning rate: 8.712E-06 | global batch size: 16 | lm loss: 7.112407E+00 | loss scale: 32768.0 | grad norm: 234566.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1965/ 159576 | consumed samples: 31440 | elapsed time per iteration (ms): 13632.5 | learning rate: 8.716E-06 | global batch size: 16 | lm loss: 6.965158E+00 | loss scale: 32768.0 | grad norm: 162405.643 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1966/ 159576 | consumed samples: 31456 | elapsed time per iteration (ms): 13923.2 | learning rate: 8.720E-06 | global batch size: 16 | lm loss: 7.162685E+00 | loss scale: 32768.0 | grad norm: 160740.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1967/ 159576 | consumed samples: 31472 | elapsed time per iteration (ms): 13722.5 | learning rate: 8.725E-06 | global batch size: 16 | lm loss: 6.822609E+00 | loss scale: 32768.0 | grad norm: 163162.027 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1968/ 159576 | consumed samples: 31488 | elapsed time per iteration (ms): 13559.9 | learning rate: 8.729E-06 | global batch size: 16 | lm loss: 6.829067E+00 | loss scale: 32768.0 | grad norm: 148991.615 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1969/ 159576 | consumed samples: 31504 | elapsed time per iteration (ms): 13640.6 | learning rate: 8.734E-06 | global batch size: 16 | lm loss: 6.753247E+00 | loss scale: 32768.0 | grad norm: 174635.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1970/ 159576 | consumed samples: 31520 | elapsed time per iteration (ms): 13996.0 | learning rate: 8.738E-06 | global batch size: 16 | lm loss: 7.113372E+00 | loss scale: 32768.0 | grad norm: 278150.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1971/ 159576 | consumed samples: 31536 | elapsed time per iteration (ms): 13669.9 | learning rate: 8.743E-06 | global batch size: 16 | lm loss: 6.872749E+00 | loss scale: 32768.0 | grad norm: 176866.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1972/ 159576 | consumed samples: 31552 | elapsed time per iteration (ms): 13634.0 | learning rate: 8.747E-06 | global batch size: 16 | lm loss: 6.944706E+00 | loss scale: 32768.0 | grad norm: 145690.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1973/ 159576 | consumed samples: 31568 | elapsed time per iteration (ms): 13676.3 | learning rate: 8.751E-06 | global batch size: 16 | lm loss: 7.106283E+00 | loss scale: 32768.0 | grad norm: 154568.562 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1974/ 159576 | consumed samples: 31584 | elapsed time per iteration (ms): 13610.0 | learning rate: 8.756E-06 | global batch size: 16 | lm loss: 7.001073E+00 | loss scale: 32768.0 | grad norm: 156908.897 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1975/ 159576 | consumed samples: 31600 | elapsed time per iteration (ms): 13727.1 | learning rate: 8.760E-06 | global batch size: 16 | lm loss: 7.050818E+00 | loss scale: 32768.0 | grad norm: 234696.902 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1976/ 159576 | consumed samples: 31616 | elapsed time per iteration (ms): 13612.3 | learning rate: 8.765E-06 | global batch size: 16 | lm loss: 7.084875E+00 | loss scale: 32768.0 | grad norm: 169650.883 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1977/ 159576 | consumed samples: 31632 | elapsed time per iteration (ms): 13652.4 | learning rate: 8.769E-06 | global batch size: 16 | lm loss: 6.942274E+00 | loss scale: 32768.0 | grad norm: 133422.940 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1978/ 159576 | consumed samples: 31648 | elapsed time per iteration (ms): 13598.6 | learning rate: 8.774E-06 | global batch size: 16 | lm loss: 7.020503E+00 | loss scale: 32768.0 | grad norm: 191046.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1979/ 159576 | consumed samples: 31664 | elapsed time per iteration (ms): 6793.7 | learning rate: 8.774E-06 | global batch size: 16 | lm loss: 7.205068E+00 | loss scale: 16384.0 | grad norm: 191046.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1980/ 159576 | consumed samples: 31680 | elapsed time per iteration (ms): 13294.9 | learning rate: 8.778E-06 | global batch size: 16 | lm loss: 6.981399E+00 | loss scale: 16384.0 | grad norm: 88750.748 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1981/ 159576 | consumed samples: 31696 | elapsed time per iteration (ms): 13611.4 | learning rate: 8.783E-06 | global batch size: 16 | lm loss: 7.062120E+00 | loss scale: 16384.0 | grad norm: 98643.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1982/ 159576 | consumed samples: 31712 | elapsed time per iteration (ms): 13593.8 | learning rate: 8.787E-06 | global batch size: 16 | lm loss: 6.878181E+00 | loss scale: 16384.0 | grad norm: 67555.616 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1983/ 159576 | consumed samples: 31728 | elapsed time per iteration (ms): 13656.6 | learning rate: 8.791E-06 | global batch size: 16 | lm loss: 6.958256E+00 | loss scale: 16384.0 | grad norm: 79163.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1984/ 159576 | consumed samples: 31744 | elapsed time per iteration (ms): 13863.2 | learning rate: 8.796E-06 | global batch size: 16 | lm loss: 6.850488E+00 | loss scale: 16384.0 | grad norm: 49908.825 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1985/ 159576 | consumed samples: 31760 | elapsed time per iteration (ms): 13625.0 | learning rate: 8.800E-06 | global batch size: 16 | lm loss: 7.227520E+00 | loss scale: 16384.0 | grad norm: 56779.919 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1986/ 159576 | consumed samples: 31776 | elapsed time per iteration (ms): 13644.4 | learning rate: 8.805E-06 | global batch size: 16 | lm loss: 7.002261E+00 | loss scale: 16384.0 | grad norm: 88929.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1987/ 159576 | consumed samples: 31792 | elapsed time per iteration (ms): 13690.4 | learning rate: 8.809E-06 | global batch size: 16 | lm loss: 7.085162E+00 | loss scale: 16384.0 | grad norm: 50454.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1988/ 159576 | consumed samples: 31808 | elapsed time per iteration (ms): 13934.9 | learning rate: 8.814E-06 | global batch size: 16 | lm loss: 6.948382E+00 | loss scale: 16384.0 | grad norm: 95360.624 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1989/ 159576 | consumed samples: 31824 | elapsed time per iteration (ms): 13779.2 | learning rate: 8.818E-06 | global batch size: 16 | lm loss: 6.810514E+00 | loss scale: 16384.0 | grad norm: 64656.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1990/ 159576 | consumed samples: 31840 | elapsed time per iteration (ms): 13639.8 | learning rate: 8.822E-06 | global batch size: 16 | lm loss: 6.904098E+00 | loss scale: 16384.0 | grad norm: 77126.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1991/ 159576 | consumed samples: 31856 | elapsed time per iteration (ms): 13559.7 | learning rate: 8.827E-06 | global batch size: 16 | lm loss: 6.833849E+00 | loss scale: 16384.0 | grad norm: 68875.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1992/ 159576 | consumed samples: 31872 | elapsed time per iteration (ms): 13602.8 | learning rate: 8.831E-06 | global batch size: 16 | lm loss: 6.989305E+00 | loss scale: 16384.0 | grad norm: 77647.510 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1993/ 159576 | consumed samples: 31888 | elapsed time per iteration (ms): 13976.7 | learning rate: 8.836E-06 | global batch size: 16 | lm loss: 6.928751E+00 | loss scale: 16384.0 | grad norm: 67757.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1994/ 159576 | consumed samples: 31904 | elapsed time per iteration (ms): 13704.1 | learning rate: 8.840E-06 | global batch size: 16 | lm loss: 6.835466E+00 | loss scale: 16384.0 | grad norm: 69187.044 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1995/ 159576 | consumed samples: 31920 | elapsed time per iteration (ms): 13650.9 | learning rate: 8.845E-06 | global batch size: 16 | lm loss: 7.294861E+00 | loss scale: 16384.0 | grad norm: 143539.847 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1996/ 159576 | consumed samples: 31936 | elapsed time per iteration (ms): 13627.5 | learning rate: 8.849E-06 | global batch size: 16 | lm loss: 7.121392E+00 | loss scale: 16384.0 | grad norm: 74325.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1997/ 159576 | consumed samples: 31952 | elapsed time per iteration (ms): 13965.6 | learning rate: 8.854E-06 | global batch size: 16 | lm loss: 7.100244E+00 | loss scale: 16384.0 | grad norm: 92130.836 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1998/ 159576 | consumed samples: 31968 | elapsed time per iteration (ms): 13575.4 | learning rate: 8.858E-06 | global batch size: 16 | lm loss: 7.115794E+00 | loss scale: 16384.0 | grad norm: 85348.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 1999/ 159576 | consumed samples: 31984 | elapsed time per iteration (ms): 13570.5 | learning rate: 8.862E-06 | global batch size: 16 | lm loss: 6.867460E+00 | loss scale: 16384.0 | grad norm: 112217.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 09:55:59,653] [INFO] [logging.py:68:log_dist] [Rank 0] step=2000, skipped=2, lr=[8.866863905325443e-06, 8.866863905325443e-06], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 2000 loss: 6.9298 iter time (s): 0.007 samples/sec: 2456.816 iteration 2000/ 159576 | consumed samples: 32000 | elapsed time per iteration (ms): 13633.7 | learning rate: 8.867E-06 | global batch size: 16 | lm loss: 6.929829E+00 | loss scale: 16384.0 | grad norm: 73846.467 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------ validation loss at iteration 2000 | lm loss value: 7.084489E+00 | lm loss PPL: 1.193313E+03 | ------------------------------------------------------------------------------------------------ iteration 2001/ 159576 | consumed samples: 32016 | elapsed time per iteration (ms): 18999.4 | learning rate: 8.871E-06 | global batch size: 16 | lm loss: 6.882600E+00 | loss scale: 16384.0 | grad norm: 132358.129 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2002/ 159576 | consumed samples: 32032 | elapsed time per iteration (ms): 13626.5 | learning rate: 8.876E-06 | global batch size: 16 | lm loss: 7.231313E+00 | loss scale: 16384.0 | grad norm: 139453.166 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2003/ 159576 | consumed samples: 32048 | elapsed time per iteration (ms): 13687.4 | learning rate: 8.880E-06 | global batch size: 16 | lm loss: 7.034769E+00 | loss scale: 16384.0 | grad norm: 74117.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2004/ 159576 | consumed samples: 32064 | elapsed time per iteration (ms): 13579.3 | learning rate: 8.885E-06 | global batch size: 16 | lm loss: 7.053939E+00 | loss scale: 16384.0 | grad norm: 185455.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2005/ 159576 | consumed samples: 32080 | elapsed time per iteration (ms): 13617.6 | learning rate: 8.889E-06 | global batch size: 16 | lm loss: 6.871277E+00 | loss scale: 16384.0 | grad norm: 117343.684 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2006/ 159576 | consumed samples: 32096 | elapsed time per iteration (ms): 13892.7 | learning rate: 8.893E-06 | global batch size: 16 | lm loss: 6.839181E+00 | loss scale: 16384.0 | grad norm: 77619.124 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2007/ 159576 | consumed samples: 32112 | elapsed time per iteration (ms): 13580.2 | learning rate: 8.898E-06 | global batch size: 16 | lm loss: 7.031313E+00 | loss scale: 16384.0 | grad norm: 111506.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2008/ 159576 | consumed samples: 32128 | elapsed time per iteration (ms): 13652.0 | learning rate: 8.902E-06 | global batch size: 16 | lm loss: 6.763354E+00 | loss scale: 16384.0 | grad norm: 74284.647 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2009/ 159576 | consumed samples: 32144 | elapsed time per iteration (ms): 13663.9 | learning rate: 8.907E-06 | global batch size: 16 | lm loss: 7.173141E+00 | loss scale: 16384.0 | grad norm: 176920.841 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2010/ 159576 | consumed samples: 32160 | elapsed time per iteration (ms): 14071.2 | learning rate: 8.911E-06 | global batch size: 16 | lm loss: 6.940368E+00 | loss scale: 16384.0 | grad norm: 136609.771 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2011/ 159576 | consumed samples: 32176 | elapsed time per iteration (ms): 13641.6 | learning rate: 8.916E-06 | global batch size: 16 | lm loss: 7.348205E+00 | loss scale: 16384.0 | grad norm: 74685.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2012/ 159576 | consumed samples: 32192 | elapsed time per iteration (ms): 13599.3 | learning rate: 8.920E-06 | global batch size: 16 | lm loss: 6.813260E+00 | loss scale: 16384.0 | grad norm: 98269.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2013/ 159576 | consumed samples: 32208 | elapsed time per iteration (ms): 13658.0 | learning rate: 8.925E-06 | global batch size: 16 | lm loss: 7.088203E+00 | loss scale: 16384.0 | grad norm: 67591.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2014/ 159576 | consumed samples: 32224 | elapsed time per iteration (ms): 14073.3 | learning rate: 8.929E-06 | global batch size: 16 | lm loss: 6.925144E+00 | loss scale: 16384.0 | grad norm: 125518.891 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2015/ 159576 | consumed samples: 32240 | elapsed time per iteration (ms): 13531.4 | learning rate: 8.933E-06 | global batch size: 16 | lm loss: 7.150875E+00 | loss scale: 16384.0 | grad norm: 145833.664 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2016/ 159576 | consumed samples: 32256 | elapsed time per iteration (ms): 13718.9 | learning rate: 8.938E-06 | global batch size: 16 | lm loss: 7.058916E+00 | loss scale: 16384.0 | grad norm: 104576.621 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2017/ 159576 | consumed samples: 32272 | elapsed time per iteration (ms): 13660.3 | learning rate: 8.942E-06 | global batch size: 16 | lm loss: 7.075126E+00 | loss scale: 16384.0 | grad norm: 68969.823 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2018/ 159576 | consumed samples: 32288 | elapsed time per iteration (ms): 13657.9 | learning rate: 8.947E-06 | global batch size: 16 | lm loss: 7.021468E+00 | loss scale: 16384.0 | grad norm: 102873.081 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2019/ 159576 | consumed samples: 32304 | elapsed time per iteration (ms): 13864.5 | learning rate: 8.951E-06 | global batch size: 16 | lm loss: 7.182456E+00 | loss scale: 16384.0 | grad norm: 83098.867 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2020/ 159576 | consumed samples: 32320 | elapsed time per iteration (ms): 13595.8 | learning rate: 8.956E-06 | global batch size: 16 | lm loss: 7.201014E+00 | loss scale: 16384.0 | grad norm: 86577.891 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2021/ 159576 | consumed samples: 32336 | elapsed time per iteration (ms): 13656.2 | learning rate: 8.960E-06 | global batch size: 16 | lm loss: 7.021406E+00 | loss scale: 16384.0 | grad norm: 81681.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2022/ 159576 | consumed samples: 32352 | elapsed time per iteration (ms): 13573.2 | learning rate: 8.964E-06 | global batch size: 16 | lm loss: 7.084285E+00 | loss scale: 16384.0 | grad norm: 87860.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2023/ 159576 | consumed samples: 32368 | elapsed time per iteration (ms): 13983.6 | learning rate: 8.969E-06 | global batch size: 16 | lm loss: 6.934657E+00 | loss scale: 16384.0 | grad norm: 59691.509 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2024/ 159576 | consumed samples: 32384 | elapsed time per iteration (ms): 13601.4 | learning rate: 8.973E-06 | global batch size: 16 | lm loss: 7.007637E+00 | loss scale: 16384.0 | grad norm: 90222.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2025/ 159576 | consumed samples: 32400 | elapsed time per iteration (ms): 13711.5 | learning rate: 8.978E-06 | global batch size: 16 | lm loss: 6.979746E+00 | loss scale: 16384.0 | grad norm: 93849.629 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2026/ 159576 | consumed samples: 32416 | elapsed time per iteration (ms): 13699.6 | learning rate: 8.982E-06 | global batch size: 16 | lm loss: 6.934021E+00 | loss scale: 16384.0 | grad norm: 80041.099 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2027/ 159576 | consumed samples: 32432 | elapsed time per iteration (ms): 14076.1 | learning rate: 8.987E-06 | global batch size: 16 | lm loss: 6.980267E+00 | loss scale: 16384.0 | grad norm: 62895.732 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2028/ 159576 | consumed samples: 32448 | elapsed time per iteration (ms): 13679.2 | learning rate: 8.991E-06 | global batch size: 16 | lm loss: 7.024888E+00 | loss scale: 16384.0 | grad norm: 52171.920 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2029/ 159576 | consumed samples: 32464 | elapsed time per iteration (ms): 13587.5 | learning rate: 8.996E-06 | global batch size: 16 | lm loss: 7.115479E+00 | loss scale: 16384.0 | grad norm: 102889.917 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2030/ 159576 | consumed samples: 32480 | elapsed time per iteration (ms): 13601.6 | learning rate: 9.000E-06 | global batch size: 16 | lm loss: 7.058015E+00 | loss scale: 16384.0 | grad norm: 59629.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2031/ 159576 | consumed samples: 32496 | elapsed time per iteration (ms): 13586.5 | learning rate: 9.004E-06 | global batch size: 16 | lm loss: 7.114190E+00 | loss scale: 16384.0 | grad norm: 71212.111 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2032/ 159576 | consumed samples: 32512 | elapsed time per iteration (ms): 13640.1 | learning rate: 9.009E-06 | global batch size: 16 | lm loss: 7.060964E+00 | loss scale: 16384.0 | grad norm: 64723.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2033/ 159576 | consumed samples: 32528 | elapsed time per iteration (ms): 13600.9 | learning rate: 9.013E-06 | global batch size: 16 | lm loss: 7.134828E+00 | loss scale: 16384.0 | grad norm: 56762.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2034/ 159576 | consumed samples: 32544 | elapsed time per iteration (ms): 13742.8 | learning rate: 9.018E-06 | global batch size: 16 | lm loss: 7.147020E+00 | loss scale: 16384.0 | grad norm: 116614.867 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2035/ 159576 | consumed samples: 32560 | elapsed time per iteration (ms): 13462.2 | learning rate: 9.022E-06 | global batch size: 16 | lm loss: 7.059257E+00 | loss scale: 16384.0 | grad norm: 95862.140 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2036/ 159576 | consumed samples: 32576 | elapsed time per iteration (ms): 13758.4 | learning rate: 9.027E-06 | global batch size: 16 | lm loss: 7.181802E+00 | loss scale: 16384.0 | grad norm: 60688.861 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2037/ 159576 | consumed samples: 32592 | elapsed time per iteration (ms): 13746.2 | learning rate: 9.031E-06 | global batch size: 16 | lm loss: 6.934449E+00 | loss scale: 16384.0 | grad norm: 71317.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2038/ 159576 | consumed samples: 32608 | elapsed time per iteration (ms): 13575.3 | learning rate: 9.036E-06 | global batch size: 16 | lm loss: 7.080134E+00 | loss scale: 16384.0 | grad norm: 57102.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2039/ 159576 | consumed samples: 32624 | elapsed time per iteration (ms): 13585.4 | learning rate: 9.040E-06 | global batch size: 16 | lm loss: 6.727631E+00 | loss scale: 16384.0 | grad norm: 74231.007 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2040/ 159576 | consumed samples: 32640 | elapsed time per iteration (ms): 13615.9 | learning rate: 9.044E-06 | global batch size: 16 | lm loss: 6.939786E+00 | loss scale: 16384.0 | grad norm: 75764.732 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.scheduler:Execution of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-24 10:05:20 CEST)" skipped: maximum number of running instances reached (3) iteration 2041/ 159576 | consumed samples: 32656 | elapsed time per iteration (ms): 14017.0 | learning rate: 9.049E-06 | global batch size: 16 | lm loss: 6.912762E+00 | loss scale: 16384.0 | grad norm: 60359.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2042/ 159576 | consumed samples: 32672 | elapsed time per iteration (ms): 13386.8 | learning rate: 9.053E-06 | global batch size: 16 | lm loss: 6.892349E+00 | loss scale: 16384.0 | grad norm: 68369.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 10:05:52] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 10:05:52] PULSE: tr8-104B is running for 4:13:41 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 2043/ 159576 | consumed samples: 32688 | elapsed time per iteration (ms): 13496.3 | learning rate: 9.058E-06 | global batch size: 16 | lm loss: 7.106496E+00 | loss scale: 16384.0 | grad norm: 74847.038 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2044/ 159576 | consumed samples: 32704 | elapsed time per iteration (ms): 13461.5 | learning rate: 9.062E-06 | global batch size: 16 | lm loss: 7.101841E+00 | loss scale: 16384.0 | grad norm: 81326.664 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2045/ 159576 | consumed samples: 32720 | elapsed time per iteration (ms): 14029.5 | learning rate: 9.067E-06 | global batch size: 16 | lm loss: 6.818883E+00 | loss scale: 16384.0 | grad norm: 55780.102 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2046/ 159576 | consumed samples: 32736 | elapsed time per iteration (ms): 13528.3 | learning rate: 9.071E-06 | global batch size: 16 | lm loss: 7.344654E+00 | loss scale: 16384.0 | grad norm: 85807.867 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2047/ 159576 | consumed samples: 32752 | elapsed time per iteration (ms): 13633.2 | learning rate: 9.075E-06 | global batch size: 16 | lm loss: 7.041794E+00 | loss scale: 16384.0 | grad norm: 68040.665 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2048/ 159576 | consumed samples: 32768 | elapsed time per iteration (ms): 13714.3 | learning rate: 9.080E-06 | global batch size: 16 | lm loss: 7.051764E+00 | loss scale: 16384.0 | grad norm: 54860.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2049/ 159576 | consumed samples: 32784 | elapsed time per iteration (ms): 13991.3 | learning rate: 9.084E-06 | global batch size: 16 | lm loss: 6.824497E+00 | loss scale: 16384.0 | grad norm: 71323.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2050/ 159576 | consumed samples: 32800 | elapsed time per iteration (ms): 13606.5 | learning rate: 9.089E-06 | global batch size: 16 | lm loss: 7.182322E+00 | loss scale: 16384.0 | grad norm: 85719.647 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2051/ 159576 | consumed samples: 32816 | elapsed time per iteration (ms): 13580.8 | learning rate: 9.093E-06 | global batch size: 16 | lm loss: 7.293634E+00 | loss scale: 16384.0 | grad norm: 80588.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2052/ 159576 | consumed samples: 32832 | elapsed time per iteration (ms): 13550.0 | learning rate: 9.098E-06 | global batch size: 16 | lm loss: 7.101615E+00 | loss scale: 16384.0 | grad norm: 84442.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2053/ 159576 | consumed samples: 32848 | elapsed time per iteration (ms): 13599.2 | learning rate: 9.102E-06 | global batch size: 16 | lm loss: 7.037670E+00 | loss scale: 16384.0 | grad norm: 66660.579 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2054/ 159576 | consumed samples: 32864 | elapsed time per iteration (ms): 13845.0 | learning rate: 9.107E-06 | global batch size: 16 | lm loss: 7.019003E+00 | loss scale: 16384.0 | grad norm: 62001.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2055/ 159576 | consumed samples: 32880 | elapsed time per iteration (ms): 13669.5 | learning rate: 9.111E-06 | global batch size: 16 | lm loss: 6.911786E+00 | loss scale: 16384.0 | grad norm: 117097.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2056/ 159576 | consumed samples: 32896 | elapsed time per iteration (ms): 13595.0 | learning rate: 9.115E-06 | global batch size: 16 | lm loss: 7.090348E+00 | loss scale: 16384.0 | grad norm: 84113.874 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2057/ 159576 | consumed samples: 32912 | elapsed time per iteration (ms): 13602.9 | learning rate: 9.120E-06 | global batch size: 16 | lm loss: 6.805397E+00 | loss scale: 16384.0 | grad norm: 74285.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2058/ 159576 | consumed samples: 32928 | elapsed time per iteration (ms): 13938.5 | learning rate: 9.124E-06 | global batch size: 16 | lm loss: 7.156925E+00 | loss scale: 16384.0 | grad norm: 123564.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2059/ 159576 | consumed samples: 32944 | elapsed time per iteration (ms): 13535.6 | learning rate: 9.129E-06 | global batch size: 16 | lm loss: 7.097910E+00 | loss scale: 16384.0 | grad norm: 80614.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2060/ 159576 | consumed samples: 32960 | elapsed time per iteration (ms): 13561.1 | learning rate: 9.133E-06 | global batch size: 16 | lm loss: 7.173540E+00 | loss scale: 16384.0 | grad norm: 82969.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2061/ 159576 | consumed samples: 32976 | elapsed time per iteration (ms): 13641.0 | learning rate: 9.138E-06 | global batch size: 16 | lm loss: 6.963642E+00 | loss scale: 16384.0 | grad norm: 58968.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2062/ 159576 | consumed samples: 32992 | elapsed time per iteration (ms): 13737.9 | learning rate: 9.142E-06 | global batch size: 16 | lm loss: 6.932078E+00 | loss scale: 16384.0 | grad norm: 176037.023 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2063/ 159576 | consumed samples: 33008 | elapsed time per iteration (ms): 13779.6 | learning rate: 9.146E-06 | global batch size: 16 | lm loss: 6.904696E+00 | loss scale: 16384.0 | grad norm: 107303.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2064/ 159576 | consumed samples: 33024 | elapsed time per iteration (ms): 13634.2 | learning rate: 9.151E-06 | global batch size: 16 | lm loss: 6.834531E+00 | loss scale: 16384.0 | grad norm: 100378.838 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2065/ 159576 | consumed samples: 33040 | elapsed time per iteration (ms): 13654.1 | learning rate: 9.155E-06 | global batch size: 16 | lm loss: 7.101809E+00 | loss scale: 16384.0 | grad norm: 100637.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2066/ 159576 | consumed samples: 33056 | elapsed time per iteration (ms): 13496.2 | learning rate: 9.160E-06 | global batch size: 16 | lm loss: 6.822946E+00 | loss scale: 16384.0 | grad norm: 72463.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2067/ 159576 | consumed samples: 33072 | elapsed time per iteration (ms): 14117.2 | learning rate: 9.164E-06 | global batch size: 16 | lm loss: 7.133995E+00 | loss scale: 16384.0 | grad norm: 265928.980 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2068/ 159576 | consumed samples: 33088 | elapsed time per iteration (ms): 13658.0 | learning rate: 9.169E-06 | global batch size: 16 | lm loss: 7.058832E+00 | loss scale: 16384.0 | grad norm: 225451.637 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2069/ 159576 | consumed samples: 33104 | elapsed time per iteration (ms): 13647.8 | learning rate: 9.173E-06 | global batch size: 16 | lm loss: 6.733691E+00 | loss scale: 16384.0 | grad norm: 109352.478 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2070/ 159576 | consumed samples: 33120 | elapsed time per iteration (ms): 13662.1 | learning rate: 9.178E-06 | global batch size: 16 | lm loss: 7.330385E+00 | loss scale: 16384.0 | grad norm: 106190.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2071/ 159576 | consumed samples: 33136 | elapsed time per iteration (ms): 14047.9 | learning rate: 9.182E-06 | global batch size: 16 | lm loss: 6.902629E+00 | loss scale: 16384.0 | grad norm: 105263.547 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2072/ 159576 | consumed samples: 33152 | elapsed time per iteration (ms): 13604.8 | learning rate: 9.186E-06 | global batch size: 16 | lm loss: 7.059223E+00 | loss scale: 16384.0 | grad norm: 156071.065 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2073/ 159576 | consumed samples: 33168 | elapsed time per iteration (ms): 13509.3 | learning rate: 9.191E-06 | global batch size: 16 | lm loss: 6.858756E+00 | loss scale: 16384.0 | grad norm: 183069.137 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2074/ 159576 | consumed samples: 33184 | elapsed time per iteration (ms): 13577.0 | learning rate: 9.195E-06 | global batch size: 16 | lm loss: 7.137619E+00 | loss scale: 16384.0 | grad norm: 165868.654 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2075/ 159576 | consumed samples: 33200 | elapsed time per iteration (ms): 13598.1 | learning rate: 9.200E-06 | global batch size: 16 | lm loss: 7.105383E+00 | loss scale: 16384.0 | grad norm: 81641.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2076/ 159576 | consumed samples: 33216 | elapsed time per iteration (ms): 13844.7 | learning rate: 9.204E-06 | global batch size: 16 | lm loss: 6.954556E+00 | loss scale: 16384.0 | grad norm: 90347.722 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2077/ 159576 | consumed samples: 33232 | elapsed time per iteration (ms): 13642.3 | learning rate: 9.209E-06 | global batch size: 16 | lm loss: 6.986308E+00 | loss scale: 16384.0 | grad norm: 71161.614 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2078/ 159576 | consumed samples: 33248 | elapsed time per iteration (ms): 13714.7 | learning rate: 9.213E-06 | global batch size: 16 | lm loss: 7.186345E+00 | loss scale: 16384.0 | grad norm: 125006.131 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2079/ 159576 | consumed samples: 33264 | elapsed time per iteration (ms): 13724.6 | learning rate: 9.217E-06 | global batch size: 16 | lm loss: 7.046529E+00 | loss scale: 16384.0 | grad norm: 72474.668 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2080/ 159576 | consumed samples: 33280 | elapsed time per iteration (ms): 13823.6 | learning rate: 9.222E-06 | global batch size: 16 | lm loss: 6.926587E+00 | loss scale: 16384.0 | grad norm: 72628.016 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2081/ 159576 | consumed samples: 33296 | elapsed time per iteration (ms): 13659.2 | learning rate: 9.226E-06 | global batch size: 16 | lm loss: 6.850713E+00 | loss scale: 16384.0 | grad norm: 78040.610 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2082/ 159576 | consumed samples: 33312 | elapsed time per iteration (ms): 13653.7 | learning rate: 9.231E-06 | global batch size: 16 | lm loss: 7.014567E+00 | loss scale: 16384.0 | grad norm: 88063.955 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2083/ 159576 | consumed samples: 33328 | elapsed time per iteration (ms): 13690.1 | learning rate: 9.235E-06 | global batch size: 16 | lm loss: 6.964838E+00 | loss scale: 16384.0 | grad norm: 68577.460 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2084/ 159576 | consumed samples: 33344 | elapsed time per iteration (ms): 14064.9 | learning rate: 9.240E-06 | global batch size: 16 | lm loss: 6.954602E+00 | loss scale: 16384.0 | grad norm: 70285.947 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2085/ 159576 | consumed samples: 33360 | elapsed time per iteration (ms): 13835.0 | learning rate: 9.244E-06 | global batch size: 16 | lm loss: 6.952052E+00 | loss scale: 16384.0 | grad norm: 85673.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2086/ 159576 | consumed samples: 33376 | elapsed time per iteration (ms): 13813.8 | learning rate: 9.249E-06 | global batch size: 16 | lm loss: 6.909387E+00 | loss scale: 16384.0 | grad norm: 118966.507 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2087/ 159576 | consumed samples: 33392 | elapsed time per iteration (ms): 13678.6 | learning rate: 9.253E-06 | global batch size: 16 | lm loss: 6.961540E+00 | loss scale: 16384.0 | grad norm: 66329.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2088/ 159576 | consumed samples: 33408 | elapsed time per iteration (ms): 13699.4 | learning rate: 9.257E-06 | global batch size: 16 | lm loss: 7.038545E+00 | loss scale: 16384.0 | grad norm: 77147.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2089/ 159576 | consumed samples: 33424 | elapsed time per iteration (ms): 13870.3 | learning rate: 9.262E-06 | global batch size: 16 | lm loss: 6.829208E+00 | loss scale: 16384.0 | grad norm: 66850.604 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2090/ 159576 | consumed samples: 33440 | elapsed time per iteration (ms): 13553.2 | learning rate: 9.266E-06 | global batch size: 16 | lm loss: 6.885040E+00 | loss scale: 16384.0 | grad norm: 63418.965 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2091/ 159576 | consumed samples: 33456 | elapsed time per iteration (ms): 13563.4 | learning rate: 9.271E-06 | global batch size: 16 | lm loss: 7.227287E+00 | loss scale: 16384.0 | grad norm: 99229.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2092/ 159576 | consumed samples: 33472 | elapsed time per iteration (ms): 13616.1 | learning rate: 9.275E-06 | global batch size: 16 | lm loss: 7.151490E+00 | loss scale: 16384.0 | grad norm: 77793.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2093/ 159576 | consumed samples: 33488 | elapsed time per iteration (ms): 14020.5 | learning rate: 9.280E-06 | global batch size: 16 | lm loss: 6.956719E+00 | loss scale: 16384.0 | grad norm: 71078.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2094/ 159576 | consumed samples: 33504 | elapsed time per iteration (ms): 13583.2 | learning rate: 9.284E-06 | global batch size: 16 | lm loss: 6.863022E+00 | loss scale: 16384.0 | grad norm: 75874.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2095/ 159576 | consumed samples: 33520 | elapsed time per iteration (ms): 13540.7 | learning rate: 9.288E-06 | global batch size: 16 | lm loss: 7.230942E+00 | loss scale: 16384.0 | grad norm: 66376.740 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2096/ 159576 | consumed samples: 33536 | elapsed time per iteration (ms): 13617.6 | learning rate: 9.293E-06 | global batch size: 16 | lm loss: 6.938297E+00 | loss scale: 16384.0 | grad norm: 80597.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2097/ 159576 | consumed samples: 33552 | elapsed time per iteration (ms): 13611.2 | learning rate: 9.297E-06 | global batch size: 16 | lm loss: 6.750860E+00 | loss scale: 16384.0 | grad norm: 50768.159 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2098/ 159576 | consumed samples: 33568 | elapsed time per iteration (ms): 13781.0 | learning rate: 9.302E-06 | global batch size: 16 | lm loss: 6.866726E+00 | loss scale: 16384.0 | grad norm: 120258.979 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2099/ 159576 | consumed samples: 33584 | elapsed time per iteration (ms): 13657.4 | learning rate: 9.306E-06 | global batch size: 16 | lm loss: 6.825637E+00 | loss scale: 16384.0 | grad norm: 95301.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2100/ 159576 | consumed samples: 33600 | elapsed time per iteration (ms): 13666.9 | learning rate: 9.311E-06 | global batch size: 16 | lm loss: 6.864701E+00 | loss scale: 16384.0 | grad norm: 68908.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2101/ 159576 | consumed samples: 33616 | elapsed time per iteration (ms): 13629.3 | learning rate: 9.315E-06 | global batch size: 16 | lm loss: 6.992301E+00 | loss scale: 16384.0 | grad norm: 74768.073 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2102/ 159576 | consumed samples: 33632 | elapsed time per iteration (ms): 14067.7 | learning rate: 9.320E-06 | global batch size: 16 | lm loss: 7.044778E+00 | loss scale: 16384.0 | grad norm: 118054.957 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2103/ 159576 | consumed samples: 33648 | elapsed time per iteration (ms): 13615.1 | learning rate: 9.324E-06 | global batch size: 16 | lm loss: 7.033617E+00 | loss scale: 16384.0 | grad norm: 69826.634 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2104/ 159576 | consumed samples: 33664 | elapsed time per iteration (ms): 13577.5 | learning rate: 9.328E-06 | global batch size: 16 | lm loss: 6.970243E+00 | loss scale: 16384.0 | grad norm: 88873.689 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2105/ 159576 | consumed samples: 33680 | elapsed time per iteration (ms): 13581.9 | learning rate: 9.333E-06 | global batch size: 16 | lm loss: 6.917067E+00 | loss scale: 16384.0 | grad norm: 93657.084 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2106/ 159576 | consumed samples: 33696 | elapsed time per iteration (ms): 14007.1 | learning rate: 9.337E-06 | global batch size: 16 | lm loss: 7.027580E+00 | loss scale: 16384.0 | grad norm: 62511.740 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2107/ 159576 | consumed samples: 33712 | elapsed time per iteration (ms): 13598.0 | learning rate: 9.342E-06 | global batch size: 16 | lm loss: 7.132909E+00 | loss scale: 16384.0 | grad norm: 177960.752 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2108/ 159576 | consumed samples: 33728 | elapsed time per iteration (ms): 13635.0 | learning rate: 9.346E-06 | global batch size: 16 | lm loss: 7.048873E+00 | loss scale: 16384.0 | grad norm: 122116.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2109/ 159576 | consumed samples: 33744 | elapsed time per iteration (ms): 13663.3 | learning rate: 9.351E-06 | global batch size: 16 | lm loss: 6.996678E+00 | loss scale: 16384.0 | grad norm: 85763.068 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2110/ 159576 | consumed samples: 33760 | elapsed time per iteration (ms): 13680.8 | learning rate: 9.355E-06 | global batch size: 16 | lm loss: 6.889836E+00 | loss scale: 16384.0 | grad norm: 84089.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2111/ 159576 | consumed samples: 33776 | elapsed time per iteration (ms): 13628.5 | learning rate: 9.359E-06 | global batch size: 16 | lm loss: 6.968468E+00 | loss scale: 16384.0 | grad norm: 51256.696 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2112/ 159576 | consumed samples: 33792 | elapsed time per iteration (ms): 13610.9 | learning rate: 9.364E-06 | global batch size: 16 | lm loss: 6.917239E+00 | loss scale: 16384.0 | grad norm: 126008.694 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2113/ 159576 | consumed samples: 33808 | elapsed time per iteration (ms): 13593.1 | learning rate: 9.368E-06 | global batch size: 16 | lm loss: 6.871556E+00 | loss scale: 16384.0 | grad norm: 67758.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2114/ 159576 | consumed samples: 33824 | elapsed time per iteration (ms): 13663.1 | learning rate: 9.373E-06 | global batch size: 16 | lm loss: 6.927833E+00 | loss scale: 16384.0 | grad norm: 85851.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2115/ 159576 | consumed samples: 33840 | elapsed time per iteration (ms): 13986.1 | learning rate: 9.377E-06 | global batch size: 16 | lm loss: 6.965062E+00 | loss scale: 16384.0 | grad norm: 65169.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2116/ 159576 | consumed samples: 33856 | elapsed time per iteration (ms): 13585.2 | learning rate: 9.382E-06 | global batch size: 16 | lm loss: 7.081017E+00 | loss scale: 16384.0 | grad norm: 73782.925 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2117/ 159576 | consumed samples: 33872 | elapsed time per iteration (ms): 13717.9 | learning rate: 9.386E-06 | global batch size: 16 | lm loss: 7.005242E+00 | loss scale: 16384.0 | grad norm: 125037.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2118/ 159576 | consumed samples: 33888 | elapsed time per iteration (ms): 13567.3 | learning rate: 9.391E-06 | global batch size: 16 | lm loss: 6.785961E+00 | loss scale: 16384.0 | grad norm: 74382.903 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2119/ 159576 | consumed samples: 33904 | elapsed time per iteration (ms): 13839.4 | learning rate: 9.395E-06 | global batch size: 16 | lm loss: 7.037541E+00 | loss scale: 16384.0 | grad norm: 61070.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2120/ 159576 | consumed samples: 33920 | elapsed time per iteration (ms): 13840.1 | learning rate: 9.399E-06 | global batch size: 16 | lm loss: 6.688106E+00 | loss scale: 16384.0 | grad norm: 77514.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2121/ 159576 | consumed samples: 33936 | elapsed time per iteration (ms): 13591.3 | learning rate: 9.404E-06 | global batch size: 16 | lm loss: 6.965182E+00 | loss scale: 16384.0 | grad norm: 85559.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2122/ 159576 | consumed samples: 33952 | elapsed time per iteration (ms): 13658.1 | learning rate: 9.408E-06 | global batch size: 16 | lm loss: 6.891047E+00 | loss scale: 16384.0 | grad norm: 84454.855 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2123/ 159576 | consumed samples: 33968 | elapsed time per iteration (ms): 13650.8 | learning rate: 9.413E-06 | global batch size: 16 | lm loss: 6.784370E+00 | loss scale: 16384.0 | grad norm: 74803.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2124/ 159576 | consumed samples: 33984 | elapsed time per iteration (ms): 13935.2 | learning rate: 9.417E-06 | global batch size: 16 | lm loss: 6.885671E+00 | loss scale: 16384.0 | grad norm: 68340.117 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2125/ 159576 | consumed samples: 34000 | elapsed time per iteration (ms): 13650.4 | learning rate: 9.422E-06 | global batch size: 16 | lm loss: 7.116186E+00 | loss scale: 16384.0 | grad norm: 75719.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2126/ 159576 | consumed samples: 34016 | elapsed time per iteration (ms): 13617.2 | learning rate: 9.426E-06 | global batch size: 16 | lm loss: 6.759393E+00 | loss scale: 16384.0 | grad norm: 57051.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2127/ 159576 | consumed samples: 34032 | elapsed time per iteration (ms): 13606.4 | learning rate: 9.430E-06 | global batch size: 16 | lm loss: 6.895882E+00 | loss scale: 16384.0 | grad norm: 117422.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2128/ 159576 | consumed samples: 34048 | elapsed time per iteration (ms): 13879.5 | learning rate: 9.435E-06 | global batch size: 16 | lm loss: 6.990780E+00 | loss scale: 16384.0 | grad norm: 47327.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2129/ 159576 | consumed samples: 34064 | elapsed time per iteration (ms): 13685.2 | learning rate: 9.439E-06 | global batch size: 16 | lm loss: 6.883922E+00 | loss scale: 16384.0 | grad norm: 75631.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2130/ 159576 | consumed samples: 34080 | elapsed time per iteration (ms): 13677.5 | learning rate: 9.444E-06 | global batch size: 16 | lm loss: 6.880146E+00 | loss scale: 16384.0 | grad norm: 70634.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2131/ 159576 | consumed samples: 34096 | elapsed time per iteration (ms): 13735.8 | learning rate: 9.448E-06 | global batch size: 16 | lm loss: 6.800762E+00 | loss scale: 16384.0 | grad norm: 114482.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2132/ 159576 | consumed samples: 34112 | elapsed time per iteration (ms): 13614.4 | learning rate: 9.453E-06 | global batch size: 16 | lm loss: 7.057775E+00 | loss scale: 16384.0 | grad norm: 131631.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2133/ 159576 | consumed samples: 34128 | elapsed time per iteration (ms): 13899.1 | learning rate: 9.457E-06 | global batch size: 16 | lm loss: 7.006071E+00 | loss scale: 16384.0 | grad norm: 88510.853 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2134/ 159576 | consumed samples: 34144 | elapsed time per iteration (ms): 13637.7 | learning rate: 9.462E-06 | global batch size: 16 | lm loss: 7.062113E+00 | loss scale: 16384.0 | grad norm: 75449.578 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2135/ 159576 | consumed samples: 34160 | elapsed time per iteration (ms): 13602.2 | learning rate: 9.466E-06 | global batch size: 16 | lm loss: 7.078564E+00 | loss scale: 16384.0 | grad norm: 130110.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2136/ 159576 | consumed samples: 34176 | elapsed time per iteration (ms): 13592.0 | learning rate: 9.470E-06 | global batch size: 16 | lm loss: 6.814717E+00 | loss scale: 16384.0 | grad norm: 149407.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2137/ 159576 | consumed samples: 34192 | elapsed time per iteration (ms): 14082.9 | learning rate: 9.475E-06 | global batch size: 16 | lm loss: 6.978102E+00 | loss scale: 16384.0 | grad norm: 53919.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2138/ 159576 | consumed samples: 34208 | elapsed time per iteration (ms): 13782.2 | learning rate: 9.479E-06 | global batch size: 16 | lm loss: 6.799563E+00 | loss scale: 16384.0 | grad norm: 71961.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2139/ 159576 | consumed samples: 34224 | elapsed time per iteration (ms): 13617.0 | learning rate: 9.484E-06 | global batch size: 16 | lm loss: 6.855867E+00 | loss scale: 16384.0 | grad norm: 59818.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2140/ 159576 | consumed samples: 34240 | elapsed time per iteration (ms): 13639.2 | learning rate: 9.488E-06 | global batch size: 16 | lm loss: 6.902345E+00 | loss scale: 16384.0 | grad norm: 58890.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2141/ 159576 | consumed samples: 34256 | elapsed time per iteration (ms): 13987.1 | learning rate: 9.493E-06 | global batch size: 16 | lm loss: 6.755795E+00 | loss scale: 16384.0 | grad norm: 77002.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2142/ 159576 | consumed samples: 34272 | elapsed time per iteration (ms): 13630.0 | learning rate: 9.497E-06 | global batch size: 16 | lm loss: 6.875304E+00 | loss scale: 16384.0 | grad norm: 67923.163 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2143/ 159576 | consumed samples: 34288 | elapsed time per iteration (ms): 13550.6 | learning rate: 9.501E-06 | global batch size: 16 | lm loss: 6.950579E+00 | loss scale: 16384.0 | grad norm: 177721.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2144/ 159576 | consumed samples: 34304 | elapsed time per iteration (ms): 13618.0 | learning rate: 9.506E-06 | global batch size: 16 | lm loss: 6.968021E+00 | loss scale: 16384.0 | grad norm: 116784.963 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2145/ 159576 | consumed samples: 34320 | elapsed time per iteration (ms): 13676.0 | learning rate: 9.510E-06 | global batch size: 16 | lm loss: 6.878886E+00 | loss scale: 16384.0 | grad norm: 69612.138 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2146/ 159576 | consumed samples: 34336 | elapsed time per iteration (ms): 13771.3 | learning rate: 9.515E-06 | global batch size: 16 | lm loss: 6.903853E+00 | loss scale: 16384.0 | grad norm: 80623.990 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2147/ 159576 | consumed samples: 34352 | elapsed time per iteration (ms): 13687.5 | learning rate: 9.519E-06 | global batch size: 16 | lm loss: 6.992352E+00 | loss scale: 16384.0 | grad norm: 50990.170 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2148/ 159576 | consumed samples: 34368 | elapsed time per iteration (ms): 13681.5 | learning rate: 9.524E-06 | global batch size: 16 | lm loss: 6.979048E+00 | loss scale: 16384.0 | grad norm: 120685.818 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2149/ 159576 | consumed samples: 34384 | elapsed time per iteration (ms): 13585.6 | learning rate: 9.528E-06 | global batch size: 16 | lm loss: 6.962264E+00 | loss scale: 16384.0 | grad norm: 95096.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2150/ 159576 | consumed samples: 34400 | elapsed time per iteration (ms): 13964.4 | learning rate: 9.533E-06 | global batch size: 16 | lm loss: 7.070148E+00 | loss scale: 16384.0 | grad norm: 102834.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2151/ 159576 | consumed samples: 34416 | elapsed time per iteration (ms): 13597.2 | learning rate: 9.537E-06 | global batch size: 16 | lm loss: 6.998973E+00 | loss scale: 16384.0 | grad norm: 66036.970 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2152/ 159576 | consumed samples: 34432 | elapsed time per iteration (ms): 13608.8 | learning rate: 9.541E-06 | global batch size: 16 | lm loss: 6.972906E+00 | loss scale: 16384.0 | grad norm: 85292.027 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2153/ 159576 | consumed samples: 34448 | elapsed time per iteration (ms): 13623.2 | learning rate: 9.546E-06 | global batch size: 16 | lm loss: 6.755056E+00 | loss scale: 16384.0 | grad norm: 76762.492 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2154/ 159576 | consumed samples: 34464 | elapsed time per iteration (ms): 13956.2 | learning rate: 9.550E-06 | global batch size: 16 | lm loss: 7.015395E+00 | loss scale: 16384.0 | grad norm: 90062.733 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2155/ 159576 | consumed samples: 34480 | elapsed time per iteration (ms): 13759.1 | learning rate: 9.555E-06 | global batch size: 16 | lm loss: 6.815333E+00 | loss scale: 16384.0 | grad norm: 68441.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2156/ 159576 | consumed samples: 34496 | elapsed time per iteration (ms): 13580.0 | learning rate: 9.559E-06 | global batch size: 16 | lm loss: 6.783628E+00 | loss scale: 16384.0 | grad norm: 110716.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2157/ 159576 | consumed samples: 34512 | elapsed time per iteration (ms): 13582.3 | learning rate: 9.564E-06 | global batch size: 16 | lm loss: 7.064082E+00 | loss scale: 16384.0 | grad norm: 62285.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2158/ 159576 | consumed samples: 34528 | elapsed time per iteration (ms): 13596.2 | learning rate: 9.568E-06 | global batch size: 16 | lm loss: 7.092577E+00 | loss scale: 16384.0 | grad norm: 69925.096 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2159/ 159576 | consumed samples: 34544 | elapsed time per iteration (ms): 13966.6 | learning rate: 9.572E-06 | global batch size: 16 | lm loss: 7.030209E+00 | loss scale: 16384.0 | grad norm: 74908.048 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2160/ 159576 | consumed samples: 34560 | elapsed time per iteration (ms): 13608.2 | learning rate: 9.577E-06 | global batch size: 16 | lm loss: 6.985407E+00 | loss scale: 16384.0 | grad norm: 107105.025 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2161/ 159576 | consumed samples: 34576 | elapsed time per iteration (ms): 13591.8 | learning rate: 9.581E-06 | global batch size: 16 | lm loss: 6.846824E+00 | loss scale: 16384.0 | grad norm: 59511.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2162/ 159576 | consumed samples: 34592 | elapsed time per iteration (ms): 13686.7 | learning rate: 9.586E-06 | global batch size: 16 | lm loss: 6.984041E+00 | loss scale: 16384.0 | grad norm: 81334.026 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2163/ 159576 | consumed samples: 34608 | elapsed time per iteration (ms): 13937.5 | learning rate: 9.590E-06 | global batch size: 16 | lm loss: 7.022871E+00 | loss scale: 16384.0 | grad norm: 84185.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2164/ 159576 | consumed samples: 34624 | elapsed time per iteration (ms): 13577.7 | learning rate: 9.595E-06 | global batch size: 16 | lm loss: 7.029066E+00 | loss scale: 16384.0 | grad norm: 47624.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2165/ 159576 | consumed samples: 34640 | elapsed time per iteration (ms): 13595.6 | learning rate: 9.599E-06 | global batch size: 16 | lm loss: 6.822045E+00 | loss scale: 16384.0 | grad norm: 138589.166 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2166/ 159576 | consumed samples: 34656 | elapsed time per iteration (ms): 13704.6 | learning rate: 9.604E-06 | global batch size: 16 | lm loss: 6.980874E+00 | loss scale: 16384.0 | grad norm: 80500.034 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2167/ 159576 | consumed samples: 34672 | elapsed time per iteration (ms): 13517.8 | learning rate: 9.608E-06 | global batch size: 16 | lm loss: 7.052095E+00 | loss scale: 16384.0 | grad norm: 68630.752 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2168/ 159576 | consumed samples: 34688 | elapsed time per iteration (ms): 13832.6 | learning rate: 9.612E-06 | global batch size: 16 | lm loss: 7.172165E+00 | loss scale: 16384.0 | grad norm: 59001.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2169/ 159576 | consumed samples: 34704 | elapsed time per iteration (ms): 13681.3 | learning rate: 9.617E-06 | global batch size: 16 | lm loss: 7.068394E+00 | loss scale: 16384.0 | grad norm: 73598.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2170/ 159576 | consumed samples: 34720 | elapsed time per iteration (ms): 13669.0 | learning rate: 9.621E-06 | global batch size: 16 | lm loss: 6.842896E+00 | loss scale: 16384.0 | grad norm: 62440.681 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2171/ 159576 | consumed samples: 34736 | elapsed time per iteration (ms): 13648.5 | learning rate: 9.626E-06 | global batch size: 16 | lm loss: 7.126867E+00 | loss scale: 16384.0 | grad norm: 155364.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2172/ 159576 | consumed samples: 34752 | elapsed time per iteration (ms): 14078.1 | learning rate: 9.630E-06 | global batch size: 16 | lm loss: 7.047744E+00 | loss scale: 16384.0 | grad norm: 113473.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2173/ 159576 | consumed samples: 34768 | elapsed time per iteration (ms): 13680.5 | learning rate: 9.635E-06 | global batch size: 16 | lm loss: 7.016094E+00 | loss scale: 16384.0 | grad norm: 73489.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2174/ 159576 | consumed samples: 34784 | elapsed time per iteration (ms): 13666.0 | learning rate: 9.639E-06 | global batch size: 16 | lm loss: 7.061403E+00 | loss scale: 16384.0 | grad norm: 75521.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2175/ 159576 | consumed samples: 34800 | elapsed time per iteration (ms): 13610.4 | learning rate: 9.643E-06 | global batch size: 16 | lm loss: 7.042882E+00 | loss scale: 16384.0 | grad norm: 95300.955 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2176/ 159576 | consumed samples: 34816 | elapsed time per iteration (ms): 14108.9 | learning rate: 9.648E-06 | global batch size: 16 | lm loss: 6.915576E+00 | loss scale: 16384.0 | grad norm: 74751.665 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2177/ 159576 | consumed samples: 34832 | elapsed time per iteration (ms): 13643.1 | learning rate: 9.652E-06 | global batch size: 16 | lm loss: 6.979721E+00 | loss scale: 16384.0 | grad norm: 71252.622 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2178/ 159576 | consumed samples: 34848 | elapsed time per iteration (ms): 13642.9 | learning rate: 9.657E-06 | global batch size: 16 | lm loss: 6.816618E+00 | loss scale: 16384.0 | grad norm: 60039.955 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2179/ 159576 | consumed samples: 34864 | elapsed time per iteration (ms): 13628.9 | learning rate: 9.661E-06 | global batch size: 16 | lm loss: 7.054741E+00 | loss scale: 16384.0 | grad norm: 196305.881 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2180/ 159576 | consumed samples: 34880 | elapsed time per iteration (ms): 13588.5 | learning rate: 9.666E-06 | global batch size: 16 | lm loss: 6.953914E+00 | loss scale: 16384.0 | grad norm: 120715.141 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2181/ 159576 | consumed samples: 34896 | elapsed time per iteration (ms): 13968.3 | learning rate: 9.670E-06 | global batch size: 16 | lm loss: 7.034101E+00 | loss scale: 16384.0 | grad norm: 81756.186 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2182/ 159576 | consumed samples: 34912 | elapsed time per iteration (ms): 13658.7 | learning rate: 9.675E-06 | global batch size: 16 | lm loss: 6.787637E+00 | loss scale: 16384.0 | grad norm: 99431.755 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2183/ 159576 | consumed samples: 34928 | elapsed time per iteration (ms): 13669.1 | learning rate: 9.679E-06 | global batch size: 16 | lm loss: 6.894065E+00 | loss scale: 16384.0 | grad norm: 83400.667 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2184/ 159576 | consumed samples: 34944 | elapsed time per iteration (ms): 13649.9 | learning rate: 9.683E-06 | global batch size: 16 | lm loss: 6.871455E+00 | loss scale: 16384.0 | grad norm: 159204.546 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2185/ 159576 | consumed samples: 34960 | elapsed time per iteration (ms): 14059.0 | learning rate: 9.688E-06 | global batch size: 16 | lm loss: 6.954823E+00 | loss scale: 16384.0 | grad norm: 106187.044 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2186/ 159576 | consumed samples: 34976 | elapsed time per iteration (ms): 13651.8 | learning rate: 9.692E-06 | global batch size: 16 | lm loss: 7.198211E+00 | loss scale: 16384.0 | grad norm: 95306.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2187/ 159576 | consumed samples: 34992 | elapsed time per iteration (ms): 13612.8 | learning rate: 9.697E-06 | global batch size: 16 | lm loss: 7.037758E+00 | loss scale: 16384.0 | grad norm: 86743.620 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2188/ 159576 | consumed samples: 35008 | elapsed time per iteration (ms): 13616.1 | learning rate: 9.701E-06 | global batch size: 16 | lm loss: 6.780216E+00 | loss scale: 16384.0 | grad norm: 66759.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2189/ 159576 | consumed samples: 35024 | elapsed time per iteration (ms): 13935.4 | learning rate: 9.706E-06 | global batch size: 16 | lm loss: 7.134370E+00 | loss scale: 16384.0 | grad norm: 224387.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2190/ 159576 | consumed samples: 35040 | elapsed time per iteration (ms): 13796.3 | learning rate: 9.710E-06 | global batch size: 16 | lm loss: 6.830962E+00 | loss scale: 16384.0 | grad norm: 184503.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2191/ 159576 | consumed samples: 35056 | elapsed time per iteration (ms): 13596.6 | learning rate: 9.714E-06 | global batch size: 16 | lm loss: 7.006136E+00 | loss scale: 16384.0 | grad norm: 105791.757 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2192/ 159576 | consumed samples: 35072 | elapsed time per iteration (ms): 13632.0 | learning rate: 9.719E-06 | global batch size: 16 | lm loss: 7.023957E+00 | loss scale: 16384.0 | grad norm: 128317.920 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2193/ 159576 | consumed samples: 35088 | elapsed time per iteration (ms): 13700.7 | learning rate: 9.723E-06 | global batch size: 16 | lm loss: 6.920637E+00 | loss scale: 16384.0 | grad norm: 90884.730 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2194/ 159576 | consumed samples: 35104 | elapsed time per iteration (ms): 13995.7 | learning rate: 9.728E-06 | global batch size: 16 | lm loss: 7.240769E+00 | loss scale: 16384.0 | grad norm: 157352.501 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2195/ 159576 | consumed samples: 35120 | elapsed time per iteration (ms): 13669.4 | learning rate: 9.732E-06 | global batch size: 16 | lm loss: 6.780205E+00 | loss scale: 16384.0 | grad norm: 106455.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2196/ 159576 | consumed samples: 35136 | elapsed time per iteration (ms): 13670.0 | learning rate: 9.737E-06 | global batch size: 16 | lm loss: 6.778285E+00 | loss scale: 16384.0 | grad norm: 86879.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2197/ 159576 | consumed samples: 35152 | elapsed time per iteration (ms): 13661.3 | learning rate: 9.741E-06 | global batch size: 16 | lm loss: 7.030122E+00 | loss scale: 16384.0 | grad norm: 93377.129 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2198/ 159576 | consumed samples: 35168 | elapsed time per iteration (ms): 13923.4 | learning rate: 9.746E-06 | global batch size: 16 | lm loss: 6.727036E+00 | loss scale: 16384.0 | grad norm: 148918.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2199/ 159576 | consumed samples: 35184 | elapsed time per iteration (ms): 13675.4 | learning rate: 9.750E-06 | global batch size: 16 | lm loss: 7.104040E+00 | loss scale: 16384.0 | grad norm: 135532.675 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2200/ 159576 | consumed samples: 35200 | elapsed time per iteration (ms): 13739.5 | learning rate: 9.754E-06 | global batch size: 16 | lm loss: 6.969880E+00 | loss scale: 16384.0 | grad norm: 96195.135 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2201/ 159576 | consumed samples: 35216 | elapsed time per iteration (ms): 13703.1 | learning rate: 9.759E-06 | global batch size: 16 | lm loss: 7.123239E+00 | loss scale: 16384.0 | grad norm: 89259.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2202/ 159576 | consumed samples: 35232 | elapsed time per iteration (ms): 13665.4 | learning rate: 9.763E-06 | global batch size: 16 | lm loss: 6.652438E+00 | loss scale: 16384.0 | grad norm: 70165.954 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2203/ 159576 | consumed samples: 35248 | elapsed time per iteration (ms): 13954.1 | learning rate: 9.768E-06 | global batch size: 16 | lm loss: 6.943371E+00 | loss scale: 16384.0 | grad norm: 138696.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2204/ 159576 | consumed samples: 35264 | elapsed time per iteration (ms): 13604.7 | learning rate: 9.772E-06 | global batch size: 16 | lm loss: 6.743501E+00 | loss scale: 16384.0 | grad norm: 190526.042 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2205/ 159576 | consumed samples: 35280 | elapsed time per iteration (ms): 13626.5 | learning rate: 9.777E-06 | global batch size: 16 | lm loss: 6.968715E+00 | loss scale: 16384.0 | grad norm: 97137.923 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2206/ 159576 | consumed samples: 35296 | elapsed time per iteration (ms): 13767.5 | learning rate: 9.781E-06 | global batch size: 16 | lm loss: 6.911567E+00 | loss scale: 16384.0 | grad norm: 68778.743 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2207/ 159576 | consumed samples: 35312 | elapsed time per iteration (ms): 14159.2 | learning rate: 9.786E-06 | global batch size: 16 | lm loss: 7.117369E+00 | loss scale: 16384.0 | grad norm: 70066.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2208/ 159576 | consumed samples: 35328 | elapsed time per iteration (ms): 13832.5 | learning rate: 9.790E-06 | global batch size: 16 | lm loss: 7.121370E+00 | loss scale: 16384.0 | grad norm: 98891.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2209/ 159576 | consumed samples: 35344 | elapsed time per iteration (ms): 13749.3 | learning rate: 9.794E-06 | global batch size: 16 | lm loss: 6.873634E+00 | loss scale: 16384.0 | grad norm: 61060.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2210/ 159576 | consumed samples: 35360 | elapsed time per iteration (ms): 13710.7 | learning rate: 9.799E-06 | global batch size: 16 | lm loss: 6.761906E+00 | loss scale: 16384.0 | grad norm: 87340.173 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2211/ 159576 | consumed samples: 35376 | elapsed time per iteration (ms): 14073.4 | learning rate: 9.803E-06 | global batch size: 16 | lm loss: 6.896225E+00 | loss scale: 16384.0 | grad norm: 67623.817 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2212/ 159576 | consumed samples: 35392 | elapsed time per iteration (ms): 13676.6 | learning rate: 9.808E-06 | global batch size: 16 | lm loss: 6.925282E+00 | loss scale: 16384.0 | grad norm: 112986.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2213/ 159576 | consumed samples: 35408 | elapsed time per iteration (ms): 13682.0 | learning rate: 9.812E-06 | global batch size: 16 | lm loss: 6.932837E+00 | loss scale: 16384.0 | grad norm: 72538.119 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2214/ 159576 | consumed samples: 35424 | elapsed time per iteration (ms): 13773.0 | learning rate: 9.817E-06 | global batch size: 16 | lm loss: 6.751261E+00 | loss scale: 16384.0 | grad norm: 110253.980 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2215/ 159576 | consumed samples: 35440 | elapsed time per iteration (ms): 13688.8 | learning rate: 9.821E-06 | global batch size: 16 | lm loss: 6.953260E+00 | loss scale: 16384.0 | grad norm: 85951.671 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2216/ 159576 | consumed samples: 35456 | elapsed time per iteration (ms): 13877.0 | learning rate: 9.825E-06 | global batch size: 16 | lm loss: 6.963014E+00 | loss scale: 16384.0 | grad norm: 78883.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2217/ 159576 | consumed samples: 35472 | elapsed time per iteration (ms): 13727.8 | learning rate: 9.830E-06 | global batch size: 16 | lm loss: 6.840832E+00 | loss scale: 16384.0 | grad norm: 92435.156 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2218/ 159576 | consumed samples: 35488 | elapsed time per iteration (ms): 13750.4 | learning rate: 9.834E-06 | global batch size: 16 | lm loss: 6.949021E+00 | loss scale: 16384.0 | grad norm: 60313.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2219/ 159576 | consumed samples: 35504 | elapsed time per iteration (ms): 13607.8 | learning rate: 9.839E-06 | global batch size: 16 | lm loss: 6.950431E+00 | loss scale: 16384.0 | grad norm: 92434.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2220/ 159576 | consumed samples: 35520 | elapsed time per iteration (ms): 14159.9 | learning rate: 9.843E-06 | global batch size: 16 | lm loss: 7.318023E+00 | loss scale: 16384.0 | grad norm: 75178.025 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2221/ 159576 | consumed samples: 35536 | elapsed time per iteration (ms): 13828.1 | learning rate: 9.848E-06 | global batch size: 16 | lm loss: 6.425551E+00 | loss scale: 16384.0 | grad norm: 66904.070 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2222/ 159576 | consumed samples: 35552 | elapsed time per iteration (ms): 13669.2 | learning rate: 9.852E-06 | global batch size: 16 | lm loss: 7.016433E+00 | loss scale: 16384.0 | grad norm: 48549.102 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2223/ 159576 | consumed samples: 35568 | elapsed time per iteration (ms): 13705.5 | learning rate: 9.857E-06 | global batch size: 16 | lm loss: 7.026052E+00 | loss scale: 16384.0 | grad norm: 87253.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2224/ 159576 | consumed samples: 35584 | elapsed time per iteration (ms): 14141.1 | learning rate: 9.861E-06 | global batch size: 16 | lm loss: 7.019730E+00 | loss scale: 16384.0 | grad norm: 75100.959 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2225/ 159576 | consumed samples: 35600 | elapsed time per iteration (ms): 13696.3 | learning rate: 9.865E-06 | global batch size: 16 | lm loss: 6.750052E+00 | loss scale: 16384.0 | grad norm: 72544.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2226/ 159576 | consumed samples: 35616 | elapsed time per iteration (ms): 13659.8 | learning rate: 9.870E-06 | global batch size: 16 | lm loss: 6.815751E+00 | loss scale: 16384.0 | grad norm: 76403.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2227/ 159576 | consumed samples: 35632 | elapsed time per iteration (ms): 13696.5 | learning rate: 9.874E-06 | global batch size: 16 | lm loss: 6.716208E+00 | loss scale: 16384.0 | grad norm: 70565.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2228/ 159576 | consumed samples: 35648 | elapsed time per iteration (ms): 13652.7 | learning rate: 9.879E-06 | global batch size: 16 | lm loss: 6.902302E+00 | loss scale: 16384.0 | grad norm: 99921.530 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2229/ 159576 | consumed samples: 35664 | elapsed time per iteration (ms): 13754.5 | learning rate: 9.883E-06 | global batch size: 16 | lm loss: 6.941592E+00 | loss scale: 16384.0 | grad norm: 77045.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2230/ 159576 | consumed samples: 35680 | elapsed time per iteration (ms): 13726.8 | learning rate: 9.888E-06 | global batch size: 16 | lm loss: 7.006780E+00 | loss scale: 16384.0 | grad norm: 79594.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2231/ 159576 | consumed samples: 35696 | elapsed time per iteration (ms): 13704.0 | learning rate: 9.892E-06 | global batch size: 16 | lm loss: 7.056840E+00 | loss scale: 16384.0 | grad norm: 72251.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2232/ 159576 | consumed samples: 35712 | elapsed time per iteration (ms): 13646.8 | learning rate: 9.896E-06 | global batch size: 16 | lm loss: 6.913527E+00 | loss scale: 16384.0 | grad norm: 58442.793 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2233/ 159576 | consumed samples: 35728 | elapsed time per iteration (ms): 14009.0 | learning rate: 9.901E-06 | global batch size: 16 | lm loss: 6.865626E+00 | loss scale: 16384.0 | grad norm: 73447.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2234/ 159576 | consumed samples: 35744 | elapsed time per iteration (ms): 13550.7 | learning rate: 9.905E-06 | global batch size: 16 | lm loss: 6.954779E+00 | loss scale: 16384.0 | grad norm: 63007.809 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2235/ 159576 | consumed samples: 35760 | elapsed time per iteration (ms): 13638.3 | learning rate: 9.910E-06 | global batch size: 16 | lm loss: 6.917772E+00 | loss scale: 16384.0 | grad norm: 73029.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2236/ 159576 | consumed samples: 35776 | elapsed time per iteration (ms): 13495.6 | learning rate: 9.914E-06 | global batch size: 16 | lm loss: 6.899360E+00 | loss scale: 16384.0 | grad norm: 58524.994 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2237/ 159576 | consumed samples: 35792 | elapsed time per iteration (ms): 13933.0 | learning rate: 9.919E-06 | global batch size: 16 | lm loss: 6.898277E+00 | loss scale: 16384.0 | grad norm: 89250.802 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2238/ 159576 | consumed samples: 35808 | elapsed time per iteration (ms): 13906.4 | learning rate: 9.923E-06 | global batch size: 16 | lm loss: 6.863415E+00 | loss scale: 16384.0 | grad norm: 57965.777 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2239/ 159576 | consumed samples: 35824 | elapsed time per iteration (ms): 13638.8 | learning rate: 9.928E-06 | global batch size: 16 | lm loss: 6.994671E+00 | loss scale: 16384.0 | grad norm: 102232.968 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2240/ 159576 | consumed samples: 35840 | elapsed time per iteration (ms): 13621.9 | learning rate: 9.932E-06 | global batch size: 16 | lm loss: 6.956360E+00 | loss scale: 16384.0 | grad norm: 69904.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2241/ 159576 | consumed samples: 35856 | elapsed time per iteration (ms): 13633.2 | learning rate: 9.936E-06 | global batch size: 16 | lm loss: 6.939447E+00 | loss scale: 16384.0 | grad norm: 95578.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2242/ 159576 | consumed samples: 35872 | elapsed time per iteration (ms): 13726.4 | learning rate: 9.941E-06 | global batch size: 16 | lm loss: 7.046509E+00 | loss scale: 16384.0 | grad norm: 82383.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2243/ 159576 | consumed samples: 35888 | elapsed time per iteration (ms): 13506.7 | learning rate: 9.945E-06 | global batch size: 16 | lm loss: 7.151508E+00 | loss scale: 16384.0 | grad norm: 98476.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2244/ 159576 | consumed samples: 35904 | elapsed time per iteration (ms): 13568.6 | learning rate: 9.950E-06 | global batch size: 16 | lm loss: 6.872870E+00 | loss scale: 16384.0 | grad norm: 74912.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2245/ 159576 | consumed samples: 35920 | elapsed time per iteration (ms): 13602.7 | learning rate: 9.954E-06 | global batch size: 16 | lm loss: 6.673596E+00 | loss scale: 16384.0 | grad norm: 76531.716 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2246/ 159576 | consumed samples: 35936 | elapsed time per iteration (ms): 14093.3 | learning rate: 9.959E-06 | global batch size: 16 | lm loss: 6.910951E+00 | loss scale: 16384.0 | grad norm: 90155.766 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2247/ 159576 | consumed samples: 35952 | elapsed time per iteration (ms): 13495.1 | learning rate: 9.963E-06 | global batch size: 16 | lm loss: 6.761725E+00 | loss scale: 16384.0 | grad norm: 71637.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2248/ 159576 | consumed samples: 35968 | elapsed time per iteration (ms): 13629.2 | learning rate: 9.967E-06 | global batch size: 16 | lm loss: 6.898269E+00 | loss scale: 16384.0 | grad norm: 99310.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2249/ 159576 | consumed samples: 35984 | elapsed time per iteration (ms): 13535.5 | learning rate: 9.972E-06 | global batch size: 16 | lm loss: 6.917497E+00 | loss scale: 16384.0 | grad norm: 74932.151 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2250/ 159576 | consumed samples: 36000 | elapsed time per iteration (ms): 13554.8 | learning rate: 9.976E-06 | global batch size: 16 | lm loss: 6.728826E+00 | loss scale: 16384.0 | grad norm: 73535.130 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2251/ 159576 | consumed samples: 36016 | elapsed time per iteration (ms): 13742.7 | learning rate: 9.981E-06 | global batch size: 16 | lm loss: 6.901268E+00 | loss scale: 16384.0 | grad norm: 76822.648 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2252/ 159576 | consumed samples: 36032 | elapsed time per iteration (ms): 13586.6 | learning rate: 9.985E-06 | global batch size: 16 | lm loss: 6.964120E+00 | loss scale: 16384.0 | grad norm: 47563.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2253/ 159576 | consumed samples: 36048 | elapsed time per iteration (ms): 13621.0 | learning rate: 9.990E-06 | global batch size: 16 | lm loss: 6.976019E+00 | loss scale: 16384.0 | grad norm: 84584.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2254/ 159576 | consumed samples: 36064 | elapsed time per iteration (ms): 13682.5 | learning rate: 9.994E-06 | global batch size: 16 | lm loss: 6.875343E+00 | loss scale: 16384.0 | grad norm: 37745.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2255/ 159576 | consumed samples: 36080 | elapsed time per iteration (ms): 14145.6 | learning rate: 9.999E-06 | global batch size: 16 | lm loss: 6.934249E+00 | loss scale: 16384.0 | grad norm: 136584.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2256/ 159576 | consumed samples: 36096 | elapsed time per iteration (ms): 13651.1 | learning rate: 1.000E-05 | global batch size: 16 | lm loss: 6.785090E+00 | loss scale: 16384.0 | grad norm: 79752.112 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2257/ 159576 | consumed samples: 36112 | elapsed time per iteration (ms): 13492.4 | learning rate: 1.001E-05 | global batch size: 16 | lm loss: 6.860191E+00 | loss scale: 16384.0 | grad norm: 66550.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2258/ 159576 | consumed samples: 36128 | elapsed time per iteration (ms): 13560.5 | learning rate: 1.001E-05 | global batch size: 16 | lm loss: 6.910413E+00 | loss scale: 16384.0 | grad norm: 67569.003 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2259/ 159576 | consumed samples: 36144 | elapsed time per iteration (ms): 14039.9 | learning rate: 1.002E-05 | global batch size: 16 | lm loss: 7.188947E+00 | loss scale: 16384.0 | grad norm: 73452.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2260/ 159576 | consumed samples: 36160 | elapsed time per iteration (ms): 13575.5 | learning rate: 1.002E-05 | global batch size: 16 | lm loss: 6.873131E+00 | loss scale: 16384.0 | grad norm: 111867.072 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2261/ 159576 | consumed samples: 36176 | elapsed time per iteration (ms): 13638.2 | learning rate: 1.003E-05 | global batch size: 16 | lm loss: 6.838548E+00 | loss scale: 16384.0 | grad norm: 80423.624 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2262/ 159576 | consumed samples: 36192 | elapsed time per iteration (ms): 13658.9 | learning rate: 1.003E-05 | global batch size: 16 | lm loss: 7.019104E+00 | loss scale: 16384.0 | grad norm: 84663.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2263/ 159576 | consumed samples: 36208 | elapsed time per iteration (ms): 13616.1 | learning rate: 1.003E-05 | global batch size: 16 | lm loss: 6.917726E+00 | loss scale: 16384.0 | grad norm: 79078.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2264/ 159576 | consumed samples: 36224 | elapsed time per iteration (ms): 13773.7 | learning rate: 1.004E-05 | global batch size: 16 | lm loss: 7.129383E+00 | loss scale: 16384.0 | grad norm: 84356.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2265/ 159576 | consumed samples: 36240 | elapsed time per iteration (ms): 13599.9 | learning rate: 1.004E-05 | global batch size: 16 | lm loss: 6.950484E+00 | loss scale: 16384.0 | grad norm: 96317.698 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2266/ 159576 | consumed samples: 36256 | elapsed time per iteration (ms): 13555.3 | learning rate: 1.005E-05 | global batch size: 16 | lm loss: 6.983542E+00 | loss scale: 16384.0 | grad norm: 87963.519 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2267/ 159576 | consumed samples: 36272 | elapsed time per iteration (ms): 13615.4 | learning rate: 1.005E-05 | global batch size: 16 | lm loss: 7.106489E+00 | loss scale: 16384.0 | grad norm: 49938.774 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2268/ 159576 | consumed samples: 36288 | elapsed time per iteration (ms): 13987.6 | learning rate: 1.006E-05 | global batch size: 16 | lm loss: 6.957284E+00 | loss scale: 16384.0 | grad norm: 80083.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2269/ 159576 | consumed samples: 36304 | elapsed time per iteration (ms): 13613.8 | learning rate: 1.006E-05 | global batch size: 16 | lm loss: 6.895617E+00 | loss scale: 16384.0 | grad norm: 89537.779 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2270/ 159576 | consumed samples: 36320 | elapsed time per iteration (ms): 13747.0 | learning rate: 1.007E-05 | global batch size: 16 | lm loss: 6.945907E+00 | loss scale: 16384.0 | grad norm: 109400.041 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2271/ 159576 | consumed samples: 36336 | elapsed time per iteration (ms): 13527.2 | learning rate: 1.007E-05 | global batch size: 16 | lm loss: 6.928704E+00 | loss scale: 16384.0 | grad norm: 78576.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2272/ 159576 | consumed samples: 36352 | elapsed time per iteration (ms): 13615.1 | learning rate: 1.007E-05 | global batch size: 16 | lm loss: 7.229642E+00 | loss scale: 16384.0 | grad norm: 80535.103 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2273/ 159576 | consumed samples: 36368 | elapsed time per iteration (ms): 13960.2 | learning rate: 1.008E-05 | global batch size: 16 | lm loss: 6.896622E+00 | loss scale: 16384.0 | grad norm: 65043.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2274/ 159576 | consumed samples: 36384 | elapsed time per iteration (ms): 13538.8 | learning rate: 1.008E-05 | global batch size: 16 | lm loss: 7.013526E+00 | loss scale: 16384.0 | grad norm: 78284.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2275/ 159576 | consumed samples: 36400 | elapsed time per iteration (ms): 13634.5 | learning rate: 1.009E-05 | global batch size: 16 | lm loss: 6.912004E+00 | loss scale: 16384.0 | grad norm: 66988.185 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2276/ 159576 | consumed samples: 36416 | elapsed time per iteration (ms): 13609.6 | learning rate: 1.009E-05 | global batch size: 16 | lm loss: 6.759723E+00 | loss scale: 16384.0 | grad norm: 69630.646 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2277/ 159576 | consumed samples: 36432 | elapsed time per iteration (ms): 14096.5 | learning rate: 1.010E-05 | global batch size: 16 | lm loss: 7.025202E+00 | loss scale: 16384.0 | grad norm: 66059.779 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2278/ 159576 | consumed samples: 36448 | elapsed time per iteration (ms): 13743.0 | learning rate: 1.010E-05 | global batch size: 16 | lm loss: 6.957587E+00 | loss scale: 16384.0 | grad norm: 80177.800 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2279/ 159576 | consumed samples: 36464 | elapsed time per iteration (ms): 13675.0 | learning rate: 1.011E-05 | global batch size: 16 | lm loss: 6.897773E+00 | loss scale: 16384.0 | grad norm: 50160.689 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2280/ 159576 | consumed samples: 36480 | elapsed time per iteration (ms): 13581.6 | learning rate: 1.011E-05 | global batch size: 16 | lm loss: 6.697253E+00 | loss scale: 16384.0 | grad norm: 64483.166 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2281/ 159576 | consumed samples: 36496 | elapsed time per iteration (ms): 13961.5 | learning rate: 1.011E-05 | global batch size: 16 | lm loss: 6.944922E+00 | loss scale: 16384.0 | grad norm: 67869.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2282/ 159576 | consumed samples: 36512 | elapsed time per iteration (ms): 13505.0 | learning rate: 1.012E-05 | global batch size: 16 | lm loss: 6.410736E+00 | loss scale: 16384.0 | grad norm: 49766.856 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2283/ 159576 | consumed samples: 36528 | elapsed time per iteration (ms): 13611.4 | learning rate: 1.012E-05 | global batch size: 16 | lm loss: 6.772882E+00 | loss scale: 16384.0 | grad norm: 59961.718 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2284/ 159576 | consumed samples: 36544 | elapsed time per iteration (ms): 13596.5 | learning rate: 1.013E-05 | global batch size: 16 | lm loss: 6.794603E+00 | loss scale: 16384.0 | grad norm: 68562.920 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2285/ 159576 | consumed samples: 36560 | elapsed time per iteration (ms): 13567.2 | learning rate: 1.013E-05 | global batch size: 16 | lm loss: 7.113194E+00 | loss scale: 16384.0 | grad norm: 59728.136 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2286/ 159576 | consumed samples: 36576 | elapsed time per iteration (ms): 13847.6 | learning rate: 1.014E-05 | global batch size: 16 | lm loss: 6.799785E+00 | loss scale: 16384.0 | grad norm: 76247.046 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2287/ 159576 | consumed samples: 36592 | elapsed time per iteration (ms): 13611.9 | learning rate: 1.014E-05 | global batch size: 16 | lm loss: 7.034187E+00 | loss scale: 16384.0 | grad norm: 50151.578 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2288/ 159576 | consumed samples: 36608 | elapsed time per iteration (ms): 13533.2 | learning rate: 1.014E-05 | global batch size: 16 | lm loss: 6.881348E+00 | loss scale: 16384.0 | grad norm: 130377.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2289/ 159576 | consumed samples: 36624 | elapsed time per iteration (ms): 13525.7 | learning rate: 1.015E-05 | global batch size: 16 | lm loss: 6.952589E+00 | loss scale: 16384.0 | grad norm: 68434.169 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2290/ 159576 | consumed samples: 36640 | elapsed time per iteration (ms): 13963.1 | learning rate: 1.015E-05 | global batch size: 16 | lm loss: 6.887176E+00 | loss scale: 16384.0 | grad norm: 89636.101 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2291/ 159576 | consumed samples: 36656 | elapsed time per iteration (ms): 13620.5 | learning rate: 1.016E-05 | global batch size: 16 | lm loss: 6.846462E+00 | loss scale: 16384.0 | grad norm: 73199.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2292/ 159576 | consumed samples: 36672 | elapsed time per iteration (ms): 13656.0 | learning rate: 1.016E-05 | global batch size: 16 | lm loss: 7.302676E+00 | loss scale: 16384.0 | grad norm: 174677.987 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2293/ 159576 | consumed samples: 36688 | elapsed time per iteration (ms): 13714.2 | learning rate: 1.017E-05 | global batch size: 16 | lm loss: 7.151010E+00 | loss scale: 16384.0 | grad norm: 135612.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2294/ 159576 | consumed samples: 36704 | elapsed time per iteration (ms): 13919.9 | learning rate: 1.017E-05 | global batch size: 16 | lm loss: 7.005547E+00 | loss scale: 16384.0 | grad norm: 89084.825 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2295/ 159576 | consumed samples: 36720 | elapsed time per iteration (ms): 13650.1 | learning rate: 1.018E-05 | global batch size: 16 | lm loss: 6.588016E+00 | loss scale: 16384.0 | grad norm: 102875.641 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2296/ 159576 | consumed samples: 36736 | elapsed time per iteration (ms): 13574.9 | learning rate: 1.018E-05 | global batch size: 16 | lm loss: 6.896825E+00 | loss scale: 16384.0 | grad norm: 70940.128 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2297/ 159576 | consumed samples: 36752 | elapsed time per iteration (ms): 13573.3 | learning rate: 1.018E-05 | global batch size: 16 | lm loss: 6.883708E+00 | loss scale: 16384.0 | grad norm: 146744.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2298/ 159576 | consumed samples: 36768 | elapsed time per iteration (ms): 13649.6 | learning rate: 1.019E-05 | global batch size: 16 | lm loss: 7.139965E+00 | loss scale: 16384.0 | grad norm: 75816.547 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2299/ 159576 | consumed samples: 36784 | elapsed time per iteration (ms): 13959.1 | learning rate: 1.019E-05 | global batch size: 16 | lm loss: 6.811082E+00 | loss scale: 16384.0 | grad norm: 83246.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2300/ 159576 | consumed samples: 36800 | elapsed time per iteration (ms): 13736.9 | learning rate: 1.020E-05 | global batch size: 16 | lm loss: 6.719008E+00 | loss scale: 16384.0 | grad norm: 93595.542 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2301/ 159576 | consumed samples: 36816 | elapsed time per iteration (ms): 13666.3 | learning rate: 1.020E-05 | global batch size: 16 | lm loss: 7.039846E+00 | loss scale: 16384.0 | grad norm: 58298.921 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2302/ 159576 | consumed samples: 36832 | elapsed time per iteration (ms): 13631.9 | learning rate: 1.021E-05 | global batch size: 16 | lm loss: 6.796918E+00 | loss scale: 16384.0 | grad norm: 153620.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2303/ 159576 | consumed samples: 36848 | elapsed time per iteration (ms): 13914.1 | learning rate: 1.021E-05 | global batch size: 16 | lm loss: 7.011253E+00 | loss scale: 16384.0 | grad norm: 79116.869 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2304/ 159576 | consumed samples: 36864 | elapsed time per iteration (ms): 13578.2 | learning rate: 1.022E-05 | global batch size: 16 | lm loss: 6.786969E+00 | loss scale: 16384.0 | grad norm: 78214.933 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2305/ 159576 | consumed samples: 36880 | elapsed time per iteration (ms): 13606.1 | learning rate: 1.022E-05 | global batch size: 16 | lm loss: 6.896228E+00 | loss scale: 16384.0 | grad norm: 59758.026 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2306/ 159576 | consumed samples: 36896 | elapsed time per iteration (ms): 13630.5 | learning rate: 1.022E-05 | global batch size: 16 | lm loss: 6.715625E+00 | loss scale: 16384.0 | grad norm: 82018.871 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 11:06:17] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 11:06:17] PULSE: tr8-104B is running for 5:14:06 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 2307/ 159576 | consumed samples: 36912 | elapsed time per iteration (ms): 13695.2 | learning rate: 1.023E-05 | global batch size: 16 | lm loss: 6.898945E+00 | loss scale: 16384.0 | grad norm: 69074.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2308/ 159576 | consumed samples: 36928 | elapsed time per iteration (ms): 13864.3 | learning rate: 1.023E-05 | global batch size: 16 | lm loss: 6.896221E+00 | loss scale: 16384.0 | grad norm: 86879.176 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2309/ 159576 | consumed samples: 36944 | elapsed time per iteration (ms): 13567.7 | learning rate: 1.024E-05 | global batch size: 16 | lm loss: 6.747959E+00 | loss scale: 16384.0 | grad norm: 77379.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2310/ 159576 | consumed samples: 36960 | elapsed time per iteration (ms): 13717.6 | learning rate: 1.024E-05 | global batch size: 16 | lm loss: 6.945070E+00 | loss scale: 16384.0 | grad norm: 55236.968 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2311/ 159576 | consumed samples: 36976 | elapsed time per iteration (ms): 13519.2 | learning rate: 1.025E-05 | global batch size: 16 | lm loss: 7.033360E+00 | loss scale: 16384.0 | grad norm: 184283.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2312/ 159576 | consumed samples: 36992 | elapsed time per iteration (ms): 14030.2 | learning rate: 1.025E-05 | global batch size: 16 | lm loss: 7.147439E+00 | loss scale: 16384.0 | grad norm: 152407.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2313/ 159576 | consumed samples: 37008 | elapsed time per iteration (ms): 13685.4 | learning rate: 1.026E-05 | global batch size: 16 | lm loss: 6.739760E+00 | loss scale: 16384.0 | grad norm: 71801.831 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2314/ 159576 | consumed samples: 37024 | elapsed time per iteration (ms): 13648.0 | learning rate: 1.026E-05 | global batch size: 16 | lm loss: 6.839672E+00 | loss scale: 16384.0 | grad norm: 112304.747 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2315/ 159576 | consumed samples: 37040 | elapsed time per iteration (ms): 13683.0 | learning rate: 1.026E-05 | global batch size: 16 | lm loss: 6.987888E+00 | loss scale: 16384.0 | grad norm: 97383.621 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2316/ 159576 | consumed samples: 37056 | elapsed time per iteration (ms): 14019.7 | learning rate: 1.027E-05 | global batch size: 16 | lm loss: 6.766959E+00 | loss scale: 16384.0 | grad norm: 70142.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2317/ 159576 | consumed samples: 37072 | elapsed time per iteration (ms): 13698.7 | learning rate: 1.027E-05 | global batch size: 16 | lm loss: 7.002495E+00 | loss scale: 16384.0 | grad norm: 94556.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2318/ 159576 | consumed samples: 37088 | elapsed time per iteration (ms): 13548.8 | learning rate: 1.028E-05 | global batch size: 16 | lm loss: 6.785909E+00 | loss scale: 16384.0 | grad norm: 84852.097 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2319/ 159576 | consumed samples: 37104 | elapsed time per iteration (ms): 13558.1 | learning rate: 1.028E-05 | global batch size: 16 | lm loss: 6.969275E+00 | loss scale: 16384.0 | grad norm: 88628.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2320/ 159576 | consumed samples: 37120 | elapsed time per iteration (ms): 13584.6 | learning rate: 1.029E-05 | global batch size: 16 | lm loss: 6.991512E+00 | loss scale: 16384.0 | grad norm: 73561.859 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2321/ 159576 | consumed samples: 37136 | elapsed time per iteration (ms): 13808.4 | learning rate: 1.029E-05 | global batch size: 16 | lm loss: 6.689001E+00 | loss scale: 16384.0 | grad norm: 79235.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2322/ 159576 | consumed samples: 37152 | elapsed time per iteration (ms): 13660.8 | learning rate: 1.030E-05 | global batch size: 16 | lm loss: 6.829502E+00 | loss scale: 16384.0 | grad norm: 69229.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2323/ 159576 | consumed samples: 37168 | elapsed time per iteration (ms): 13667.4 | learning rate: 1.030E-05 | global batch size: 16 | lm loss: 6.532575E+00 | loss scale: 16384.0 | grad norm: 55927.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2324/ 159576 | consumed samples: 37184 | elapsed time per iteration (ms): 13703.5 | learning rate: 1.030E-05 | global batch size: 16 | lm loss: 6.922344E+00 | loss scale: 16384.0 | grad norm: 55395.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2325/ 159576 | consumed samples: 37200 | elapsed time per iteration (ms): 14028.0 | learning rate: 1.031E-05 | global batch size: 16 | lm loss: 6.827266E+00 | loss scale: 16384.0 | grad norm: 53256.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2326/ 159576 | consumed samples: 37216 | elapsed time per iteration (ms): 13463.4 | learning rate: 1.031E-05 | global batch size: 16 | lm loss: 6.792019E+00 | loss scale: 16384.0 | grad norm: 61740.952 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2327/ 159576 | consumed samples: 37232 | elapsed time per iteration (ms): 13567.6 | learning rate: 1.032E-05 | global batch size: 16 | lm loss: 6.871485E+00 | loss scale: 16384.0 | grad norm: 65916.886 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2328/ 159576 | consumed samples: 37248 | elapsed time per iteration (ms): 13610.6 | learning rate: 1.032E-05 | global batch size: 16 | lm loss: 6.773655E+00 | loss scale: 16384.0 | grad norm: 55451.884 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2329/ 159576 | consumed samples: 37264 | elapsed time per iteration (ms): 13843.3 | learning rate: 1.033E-05 | global batch size: 16 | lm loss: 6.881806E+00 | loss scale: 16384.0 | grad norm: 68242.844 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2330/ 159576 | consumed samples: 37280 | elapsed time per iteration (ms): 13903.0 | learning rate: 1.033E-05 | global batch size: 16 | lm loss: 6.769863E+00 | loss scale: 16384.0 | grad norm: 54395.878 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2331/ 159576 | consumed samples: 37296 | elapsed time per iteration (ms): 13689.8 | learning rate: 1.034E-05 | global batch size: 16 | lm loss: 6.915558E+00 | loss scale: 16384.0 | grad norm: 69787.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2332/ 159576 | consumed samples: 37312 | elapsed time per iteration (ms): 13584.4 | learning rate: 1.034E-05 | global batch size: 16 | lm loss: 6.872691E+00 | loss scale: 16384.0 | grad norm: 53158.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2333/ 159576 | consumed samples: 37328 | elapsed time per iteration (ms): 13510.8 | learning rate: 1.034E-05 | global batch size: 16 | lm loss: 6.772065E+00 | loss scale: 16384.0 | grad norm: 62866.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2334/ 159576 | consumed samples: 37344 | elapsed time per iteration (ms): 13981.1 | learning rate: 1.035E-05 | global batch size: 16 | lm loss: 6.889673E+00 | loss scale: 16384.0 | grad norm: 79595.177 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2335/ 159576 | consumed samples: 37360 | elapsed time per iteration (ms): 13567.6 | learning rate: 1.035E-05 | global batch size: 16 | lm loss: 6.996318E+00 | loss scale: 16384.0 | grad norm: 47255.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2336/ 159576 | consumed samples: 37376 | elapsed time per iteration (ms): 13643.5 | learning rate: 1.036E-05 | global batch size: 16 | lm loss: 6.824782E+00 | loss scale: 16384.0 | grad norm: 152401.829 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2337/ 159576 | consumed samples: 37392 | elapsed time per iteration (ms): 13630.4 | learning rate: 1.036E-05 | global batch size: 16 | lm loss: 6.711504E+00 | loss scale: 16384.0 | grad norm: 73188.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2338/ 159576 | consumed samples: 37408 | elapsed time per iteration (ms): 14043.0 | learning rate: 1.037E-05 | global batch size: 16 | lm loss: 6.830018E+00 | loss scale: 16384.0 | grad norm: 92791.023 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2339/ 159576 | consumed samples: 37424 | elapsed time per iteration (ms): 13758.4 | learning rate: 1.037E-05 | global batch size: 16 | lm loss: 7.017688E+00 | loss scale: 16384.0 | grad norm: 87062.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2340/ 159576 | consumed samples: 37440 | elapsed time per iteration (ms): 13518.0 | learning rate: 1.038E-05 | global batch size: 16 | lm loss: 6.749167E+00 | loss scale: 16384.0 | grad norm: 72774.580 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2341/ 159576 | consumed samples: 37456 | elapsed time per iteration (ms): 13582.6 | learning rate: 1.038E-05 | global batch size: 16 | lm loss: 7.188419E+00 | loss scale: 16384.0 | grad norm: 400324.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2342/ 159576 | consumed samples: 37472 | elapsed time per iteration (ms): 13646.9 | learning rate: 1.038E-05 | global batch size: 16 | lm loss: 7.124457E+00 | loss scale: 16384.0 | grad norm: 441674.699 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2343/ 159576 | consumed samples: 37488 | elapsed time per iteration (ms): 13721.9 | learning rate: 1.039E-05 | global batch size: 16 | lm loss: 6.941244E+00 | loss scale: 16384.0 | grad norm: 218702.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2344/ 159576 | consumed samples: 37504 | elapsed time per iteration (ms): 13653.7 | learning rate: 1.039E-05 | global batch size: 16 | lm loss: 6.768173E+00 | loss scale: 16384.0 | grad norm: 93071.046 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2345/ 159576 | consumed samples: 37520 | elapsed time per iteration (ms): 13684.4 | learning rate: 1.040E-05 | global batch size: 16 | lm loss: 6.862311E+00 | loss scale: 16384.0 | grad norm: 105985.790 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2346/ 159576 | consumed samples: 37536 | elapsed time per iteration (ms): 13732.9 | learning rate: 1.040E-05 | global batch size: 16 | lm loss: 7.097474E+00 | loss scale: 16384.0 | grad norm: 93646.720 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2347/ 159576 | consumed samples: 37552 | elapsed time per iteration (ms): 14087.6 | learning rate: 1.041E-05 | global batch size: 16 | lm loss: 6.949347E+00 | loss scale: 16384.0 | grad norm: 169536.748 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2348/ 159576 | consumed samples: 37568 | elapsed time per iteration (ms): 13603.2 | learning rate: 1.041E-05 | global batch size: 16 | lm loss: 6.839984E+00 | loss scale: 16384.0 | grad norm: 221068.794 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2349/ 159576 | consumed samples: 37584 | elapsed time per iteration (ms): 13602.7 | learning rate: 1.042E-05 | global batch size: 16 | lm loss: 6.722544E+00 | loss scale: 16384.0 | grad norm: 90138.978 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2350/ 159576 | consumed samples: 37600 | elapsed time per iteration (ms): 13600.0 | learning rate: 1.042E-05 | global batch size: 16 | lm loss: 6.765959E+00 | loss scale: 16384.0 | grad norm: 87849.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2351/ 159576 | consumed samples: 37616 | elapsed time per iteration (ms): 14049.9 | learning rate: 1.042E-05 | global batch size: 16 | lm loss: 7.058582E+00 | loss scale: 16384.0 | grad norm: 97203.038 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2352/ 159576 | consumed samples: 37632 | elapsed time per iteration (ms): 13664.4 | learning rate: 1.043E-05 | global batch size: 16 | lm loss: 6.709276E+00 | loss scale: 16384.0 | grad norm: 64321.034 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2353/ 159576 | consumed samples: 37648 | elapsed time per iteration (ms): 13697.2 | learning rate: 1.043E-05 | global batch size: 16 | lm loss: 6.963477E+00 | loss scale: 16384.0 | grad norm: 219491.874 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2354/ 159576 | consumed samples: 37664 | elapsed time per iteration (ms): 13647.8 | learning rate: 1.044E-05 | global batch size: 16 | lm loss: 6.986011E+00 | loss scale: 16384.0 | grad norm: 159710.177 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2355/ 159576 | consumed samples: 37680 | elapsed time per iteration (ms): 13594.7 | learning rate: 1.044E-05 | global batch size: 16 | lm loss: 6.833197E+00 | loss scale: 16384.0 | grad norm: 97227.942 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2356/ 159576 | consumed samples: 37696 | elapsed time per iteration (ms): 13840.6 | learning rate: 1.045E-05 | global batch size: 16 | lm loss: 7.008437E+00 | loss scale: 16384.0 | grad norm: 89122.852 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2357/ 159576 | consumed samples: 37712 | elapsed time per iteration (ms): 13588.8 | learning rate: 1.045E-05 | global batch size: 16 | lm loss: 6.835823E+00 | loss scale: 16384.0 | grad norm: 77947.804 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2358/ 159576 | consumed samples: 37728 | elapsed time per iteration (ms): 13642.6 | learning rate: 1.046E-05 | global batch size: 16 | lm loss: 6.735652E+00 | loss scale: 16384.0 | grad norm: 162106.613 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2359/ 159576 | consumed samples: 37744 | elapsed time per iteration (ms): 13658.5 | learning rate: 1.046E-05 | global batch size: 16 | lm loss: 6.785017E+00 | loss scale: 16384.0 | grad norm: 128794.072 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2360/ 159576 | consumed samples: 37760 | elapsed time per iteration (ms): 14062.2 | learning rate: 1.046E-05 | global batch size: 16 | lm loss: 6.878942E+00 | loss scale: 16384.0 | grad norm: 101269.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2361/ 159576 | consumed samples: 37776 | elapsed time per iteration (ms): 13561.0 | learning rate: 1.047E-05 | global batch size: 16 | lm loss: 6.893463E+00 | loss scale: 16384.0 | grad norm: 78515.515 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2362/ 159576 | consumed samples: 37792 | elapsed time per iteration (ms): 13714.6 | learning rate: 1.047E-05 | global batch size: 16 | lm loss: 6.821845E+00 | loss scale: 16384.0 | grad norm: 78649.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2363/ 159576 | consumed samples: 37808 | elapsed time per iteration (ms): 13594.5 | learning rate: 1.048E-05 | global batch size: 16 | lm loss: 6.845947E+00 | loss scale: 16384.0 | grad norm: 158409.972 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2364/ 159576 | consumed samples: 37824 | elapsed time per iteration (ms): 13648.4 | learning rate: 1.048E-05 | global batch size: 16 | lm loss: 6.840971E+00 | loss scale: 16384.0 | grad norm: 88723.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2365/ 159576 | consumed samples: 37840 | elapsed time per iteration (ms): 13958.9 | learning rate: 1.049E-05 | global batch size: 16 | lm loss: 6.785653E+00 | loss scale: 16384.0 | grad norm: 106713.788 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2366/ 159576 | consumed samples: 37856 | elapsed time per iteration (ms): 13666.9 | learning rate: 1.049E-05 | global batch size: 16 | lm loss: 6.917600E+00 | loss scale: 16384.0 | grad norm: 90335.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2367/ 159576 | consumed samples: 37872 | elapsed time per iteration (ms): 13690.6 | learning rate: 1.050E-05 | global batch size: 16 | lm loss: 6.840955E+00 | loss scale: 16384.0 | grad norm: 63357.757 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2368/ 159576 | consumed samples: 37888 | elapsed time per iteration (ms): 13664.8 | learning rate: 1.050E-05 | global batch size: 16 | lm loss: 6.916069E+00 | loss scale: 16384.0 | grad norm: 107961.857 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2369/ 159576 | consumed samples: 37904 | elapsed time per iteration (ms): 14065.2 | learning rate: 1.050E-05 | global batch size: 16 | lm loss: 6.853414E+00 | loss scale: 16384.0 | grad norm: 84442.897 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2370/ 159576 | consumed samples: 37920 | elapsed time per iteration (ms): 13656.3 | learning rate: 1.051E-05 | global batch size: 16 | lm loss: 6.827930E+00 | loss scale: 16384.0 | grad norm: 62880.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2371/ 159576 | consumed samples: 37936 | elapsed time per iteration (ms): 13590.5 | learning rate: 1.051E-05 | global batch size: 16 | lm loss: 6.877656E+00 | loss scale: 16384.0 | grad norm: 75866.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2372/ 159576 | consumed samples: 37952 | elapsed time per iteration (ms): 13605.0 | learning rate: 1.052E-05 | global batch size: 16 | lm loss: 6.995963E+00 | loss scale: 16384.0 | grad norm: 71192.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2373/ 159576 | consumed samples: 37968 | elapsed time per iteration (ms): 13951.5 | learning rate: 1.052E-05 | global batch size: 16 | lm loss: 6.794531E+00 | loss scale: 16384.0 | grad norm: 64517.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2374/ 159576 | consumed samples: 37984 | elapsed time per iteration (ms): 13624.2 | learning rate: 1.053E-05 | global batch size: 16 | lm loss: 6.780855E+00 | loss scale: 16384.0 | grad norm: 83255.646 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2375/ 159576 | consumed samples: 38000 | elapsed time per iteration (ms): 13615.3 | learning rate: 1.053E-05 | global batch size: 16 | lm loss: 6.964709E+00 | loss scale: 16384.0 | grad norm: 79867.121 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2376/ 159576 | consumed samples: 38016 | elapsed time per iteration (ms): 13718.1 | learning rate: 1.054E-05 | global batch size: 16 | lm loss: 6.657259E+00 | loss scale: 16384.0 | grad norm: 60555.655 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2377/ 159576 | consumed samples: 38032 | elapsed time per iteration (ms): 13629.0 | learning rate: 1.054E-05 | global batch size: 16 | lm loss: 6.923594E+00 | loss scale: 16384.0 | grad norm: 52753.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2378/ 159576 | consumed samples: 38048 | elapsed time per iteration (ms): 13734.6 | learning rate: 1.054E-05 | global batch size: 16 | lm loss: 6.887539E+00 | loss scale: 16384.0 | grad norm: 103430.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2379/ 159576 | consumed samples: 38064 | elapsed time per iteration (ms): 13608.8 | learning rate: 1.055E-05 | global batch size: 16 | lm loss: 6.627044E+00 | loss scale: 16384.0 | grad norm: 73977.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2380/ 159576 | consumed samples: 38080 | elapsed time per iteration (ms): 13595.9 | learning rate: 1.055E-05 | global batch size: 16 | lm loss: 6.894679E+00 | loss scale: 16384.0 | grad norm: 66400.111 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2381/ 159576 | consumed samples: 38096 | elapsed time per iteration (ms): 13599.7 | learning rate: 1.056E-05 | global batch size: 16 | lm loss: 6.938529E+00 | loss scale: 16384.0 | grad norm: 70512.175 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2382/ 159576 | consumed samples: 38112 | elapsed time per iteration (ms): 14135.5 | learning rate: 1.056E-05 | global batch size: 16 | lm loss: 7.303653E+00 | loss scale: 16384.0 | grad norm: 79783.691 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2383/ 159576 | consumed samples: 38128 | elapsed time per iteration (ms): 13647.3 | learning rate: 1.057E-05 | global batch size: 16 | lm loss: 6.764983E+00 | loss scale: 16384.0 | grad norm: 74049.858 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2384/ 159576 | consumed samples: 38144 | elapsed time per iteration (ms): 13719.9 | learning rate: 1.057E-05 | global batch size: 16 | lm loss: 7.032783E+00 | loss scale: 16384.0 | grad norm: 66855.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2385/ 159576 | consumed samples: 38160 | elapsed time per iteration (ms): 13573.5 | learning rate: 1.058E-05 | global batch size: 16 | lm loss: 6.839710E+00 | loss scale: 16384.0 | grad norm: 58744.040 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2386/ 159576 | consumed samples: 38176 | elapsed time per iteration (ms): 14051.4 | learning rate: 1.058E-05 | global batch size: 16 | lm loss: 6.409803E+00 | loss scale: 16384.0 | grad norm: 54804.059 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2387/ 159576 | consumed samples: 38192 | elapsed time per iteration (ms): 13628.8 | learning rate: 1.058E-05 | global batch size: 16 | lm loss: 6.752995E+00 | loss scale: 16384.0 | grad norm: 57078.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2388/ 159576 | consumed samples: 38208 | elapsed time per iteration (ms): 13611.0 | learning rate: 1.059E-05 | global batch size: 16 | lm loss: 6.738320E+00 | loss scale: 16384.0 | grad norm: 45381.080 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2389/ 159576 | consumed samples: 38224 | elapsed time per iteration (ms): 13583.7 | learning rate: 1.059E-05 | global batch size: 16 | lm loss: 6.858883E+00 | loss scale: 16384.0 | grad norm: 86212.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2390/ 159576 | consumed samples: 38240 | elapsed time per iteration (ms): 13679.8 | learning rate: 1.060E-05 | global batch size: 16 | lm loss: 7.024375E+00 | loss scale: 16384.0 | grad norm: 66322.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2391/ 159576 | consumed samples: 38256 | elapsed time per iteration (ms): 13997.0 | learning rate: 1.060E-05 | global batch size: 16 | lm loss: 6.983364E+00 | loss scale: 16384.0 | grad norm: 84730.119 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2392/ 159576 | consumed samples: 38272 | elapsed time per iteration (ms): 13673.8 | learning rate: 1.061E-05 | global batch size: 16 | lm loss: 6.900928E+00 | loss scale: 16384.0 | grad norm: 52849.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2393/ 159576 | consumed samples: 38288 | elapsed time per iteration (ms): 13615.2 | learning rate: 1.061E-05 | global batch size: 16 | lm loss: 6.866693E+00 | loss scale: 16384.0 | grad norm: 87208.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2394/ 159576 | consumed samples: 38304 | elapsed time per iteration (ms): 13615.9 | learning rate: 1.062E-05 | global batch size: 16 | lm loss: 6.702727E+00 | loss scale: 16384.0 | grad norm: 69928.497 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2395/ 159576 | consumed samples: 38320 | elapsed time per iteration (ms): 14056.6 | learning rate: 1.062E-05 | global batch size: 16 | lm loss: 6.909261E+00 | loss scale: 16384.0 | grad norm: 122690.959 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2396/ 159576 | consumed samples: 38336 | elapsed time per iteration (ms): 13483.1 | learning rate: 1.062E-05 | global batch size: 16 | lm loss: 6.938586E+00 | loss scale: 16384.0 | grad norm: 80283.093 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2397/ 159576 | consumed samples: 38352 | elapsed time per iteration (ms): 13678.0 | learning rate: 1.063E-05 | global batch size: 16 | lm loss: 6.916673E+00 | loss scale: 16384.0 | grad norm: 78417.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2398/ 159576 | consumed samples: 38368 | elapsed time per iteration (ms): 13713.3 | learning rate: 1.063E-05 | global batch size: 16 | lm loss: 6.894761E+00 | loss scale: 16384.0 | grad norm: 79613.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2399/ 159576 | consumed samples: 38384 | elapsed time per iteration (ms): 13844.0 | learning rate: 1.064E-05 | global batch size: 16 | lm loss: 6.895288E+00 | loss scale: 16384.0 | grad norm: 117360.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2400/ 159576 | consumed samples: 38400 | elapsed time per iteration (ms): 13869.8 | learning rate: 1.064E-05 | global batch size: 16 | lm loss: 7.002610E+00 | loss scale: 16384.0 | grad norm: 98958.976 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2401/ 159576 | consumed samples: 38416 | elapsed time per iteration (ms): 13601.8 | learning rate: 1.065E-05 | global batch size: 16 | lm loss: 6.744779E+00 | loss scale: 16384.0 | grad norm: 75497.856 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2402/ 159576 | consumed samples: 38432 | elapsed time per iteration (ms): 13599.2 | learning rate: 1.065E-05 | global batch size: 16 | lm loss: 7.107717E+00 | loss scale: 16384.0 | grad norm: 78343.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2403/ 159576 | consumed samples: 38448 | elapsed time per iteration (ms): 13623.1 | learning rate: 1.066E-05 | global batch size: 16 | lm loss: 6.897991E+00 | loss scale: 16384.0 | grad norm: 89054.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2404/ 159576 | consumed samples: 38464 | elapsed time per iteration (ms): 14088.2 | learning rate: 1.066E-05 | global batch size: 16 | lm loss: 6.915084E+00 | loss scale: 16384.0 | grad norm: 88153.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2405/ 159576 | consumed samples: 38480 | elapsed time per iteration (ms): 13711.7 | learning rate: 1.066E-05 | global batch size: 16 | lm loss: 6.791551E+00 | loss scale: 16384.0 | grad norm: 81047.565 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2406/ 159576 | consumed samples: 38496 | elapsed time per iteration (ms): 13659.9 | learning rate: 1.067E-05 | global batch size: 16 | lm loss: 6.768214E+00 | loss scale: 16384.0 | grad norm: 63942.069 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2407/ 159576 | consumed samples: 38512 | elapsed time per iteration (ms): 13659.5 | learning rate: 1.067E-05 | global batch size: 16 | lm loss: 6.785830E+00 | loss scale: 16384.0 | grad norm: 50544.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2408/ 159576 | consumed samples: 38528 | elapsed time per iteration (ms): 14010.2 | learning rate: 1.068E-05 | global batch size: 16 | lm loss: 6.781000E+00 | loss scale: 16384.0 | grad norm: 114170.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2409/ 159576 | consumed samples: 38544 | elapsed time per iteration (ms): 13587.7 | learning rate: 1.068E-05 | global batch size: 16 | lm loss: 6.876911E+00 | loss scale: 16384.0 | grad norm: 60235.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2410/ 159576 | consumed samples: 38560 | elapsed time per iteration (ms): 13605.6 | learning rate: 1.069E-05 | global batch size: 16 | lm loss: 6.837091E+00 | loss scale: 16384.0 | grad norm: 72387.988 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2411/ 159576 | consumed samples: 38576 | elapsed time per iteration (ms): 13675.7 | learning rate: 1.069E-05 | global batch size: 16 | lm loss: 6.912636E+00 | loss scale: 16384.0 | grad norm: 76432.994 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2412/ 159576 | consumed samples: 38592 | elapsed time per iteration (ms): 13569.6 | learning rate: 1.070E-05 | global batch size: 16 | lm loss: 6.712539E+00 | loss scale: 16384.0 | grad norm: 113832.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2413/ 159576 | consumed samples: 38608 | elapsed time per iteration (ms): 13932.9 | learning rate: 1.070E-05 | global batch size: 16 | lm loss: 6.804219E+00 | loss scale: 16384.0 | grad norm: 73073.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2414/ 159576 | consumed samples: 38624 | elapsed time per iteration (ms): 13742.1 | learning rate: 1.070E-05 | global batch size: 16 | lm loss: 6.947999E+00 | loss scale: 16384.0 | grad norm: 90599.997 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2415/ 159576 | consumed samples: 38640 | elapsed time per iteration (ms): 13556.3 | learning rate: 1.071E-05 | global batch size: 16 | lm loss: 7.002557E+00 | loss scale: 16384.0 | grad norm: 71840.830 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2416/ 159576 | consumed samples: 38656 | elapsed time per iteration (ms): 13593.5 | learning rate: 1.071E-05 | global batch size: 16 | lm loss: 6.920745E+00 | loss scale: 16384.0 | grad norm: 60284.538 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2417/ 159576 | consumed samples: 38672 | elapsed time per iteration (ms): 14084.6 | learning rate: 1.072E-05 | global batch size: 16 | lm loss: 7.137000E+00 | loss scale: 16384.0 | grad norm: 185539.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2418/ 159576 | consumed samples: 38688 | elapsed time per iteration (ms): 13641.5 | learning rate: 1.072E-05 | global batch size: 16 | lm loss: 6.757603E+00 | loss scale: 16384.0 | grad norm: 127319.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2419/ 159576 | consumed samples: 38704 | elapsed time per iteration (ms): 13580.1 | learning rate: 1.073E-05 | global batch size: 16 | lm loss: 6.869411E+00 | loss scale: 16384.0 | grad norm: 97709.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2420/ 159576 | consumed samples: 38720 | elapsed time per iteration (ms): 13629.2 | learning rate: 1.073E-05 | global batch size: 16 | lm loss: 6.709553E+00 | loss scale: 16384.0 | grad norm: 92144.986 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2421/ 159576 | consumed samples: 38736 | elapsed time per iteration (ms): 14151.6 | learning rate: 1.074E-05 | global batch size: 16 | lm loss: 6.884684E+00 | loss scale: 16384.0 | grad norm: 68698.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2422/ 159576 | consumed samples: 38752 | elapsed time per iteration (ms): 13613.5 | learning rate: 1.074E-05 | global batch size: 16 | lm loss: 6.869916E+00 | loss scale: 16384.0 | grad norm: 183504.116 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2423/ 159576 | consumed samples: 38768 | elapsed time per iteration (ms): 13633.7 | learning rate: 1.074E-05 | global batch size: 16 | lm loss: 6.890718E+00 | loss scale: 16384.0 | grad norm: 156548.776 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2424/ 159576 | consumed samples: 38784 | elapsed time per iteration (ms): 13607.9 | learning rate: 1.075E-05 | global batch size: 16 | lm loss: 6.935307E+00 | loss scale: 16384.0 | grad norm: 64330.150 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2425/ 159576 | consumed samples: 38800 | elapsed time per iteration (ms): 13605.4 | learning rate: 1.075E-05 | global batch size: 16 | lm loss: 6.766086E+00 | loss scale: 16384.0 | grad norm: 69465.082 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2426/ 159576 | consumed samples: 38816 | elapsed time per iteration (ms): 13928.6 | learning rate: 1.076E-05 | global batch size: 16 | lm loss: 7.066947E+00 | loss scale: 16384.0 | grad norm: 107634.865 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2427/ 159576 | consumed samples: 38832 | elapsed time per iteration (ms): 13650.1 | learning rate: 1.076E-05 | global batch size: 16 | lm loss: 7.050639E+00 | loss scale: 16384.0 | grad norm: 95342.870 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2428/ 159576 | consumed samples: 38848 | elapsed time per iteration (ms): 13681.2 | learning rate: 1.077E-05 | global batch size: 16 | lm loss: 6.855616E+00 | loss scale: 16384.0 | grad norm: 59595.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2429/ 159576 | consumed samples: 38864 | elapsed time per iteration (ms): 13695.9 | learning rate: 1.077E-05 | global batch size: 16 | lm loss: 7.041804E+00 | loss scale: 16384.0 | grad norm: 65131.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2430/ 159576 | consumed samples: 38880 | elapsed time per iteration (ms): 13962.7 | learning rate: 1.078E-05 | global batch size: 16 | lm loss: 6.803939E+00 | loss scale: 16384.0 | grad norm: 63269.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2431/ 159576 | consumed samples: 38896 | elapsed time per iteration (ms): 13583.2 | learning rate: 1.078E-05 | global batch size: 16 | lm loss: 6.876345E+00 | loss scale: 16384.0 | grad norm: 74949.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2432/ 159576 | consumed samples: 38912 | elapsed time per iteration (ms): 13606.6 | learning rate: 1.078E-05 | global batch size: 16 | lm loss: 6.916327E+00 | loss scale: 16384.0 | grad norm: 74586.629 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2433/ 159576 | consumed samples: 38928 | elapsed time per iteration (ms): 13607.5 | learning rate: 1.079E-05 | global batch size: 16 | lm loss: 6.779680E+00 | loss scale: 16384.0 | grad norm: 82519.547 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2434/ 159576 | consumed samples: 38944 | elapsed time per iteration (ms): 13894.0 | learning rate: 1.079E-05 | global batch size: 16 | lm loss: 6.903611E+00 | loss scale: 16384.0 | grad norm: 69004.549 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2435/ 159576 | consumed samples: 38960 | elapsed time per iteration (ms): 13779.1 | learning rate: 1.080E-05 | global batch size: 16 | lm loss: 6.630243E+00 | loss scale: 16384.0 | grad norm: 107197.604 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2436/ 159576 | consumed samples: 38976 | elapsed time per iteration (ms): 13659.0 | learning rate: 1.080E-05 | global batch size: 16 | lm loss: 6.876919E+00 | loss scale: 16384.0 | grad norm: 77407.687 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2437/ 159576 | consumed samples: 38992 | elapsed time per iteration (ms): 13553.5 | learning rate: 1.081E-05 | global batch size: 16 | lm loss: 6.728307E+00 | loss scale: 16384.0 | grad norm: 79645.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2438/ 159576 | consumed samples: 39008 | elapsed time per iteration (ms): 13664.0 | learning rate: 1.081E-05 | global batch size: 16 | lm loss: 6.923852E+00 | loss scale: 16384.0 | grad norm: 70221.677 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2439/ 159576 | consumed samples: 39024 | elapsed time per iteration (ms): 13814.4 | learning rate: 1.082E-05 | global batch size: 16 | lm loss: 6.729681E+00 | loss scale: 16384.0 | grad norm: 71734.084 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2440/ 159576 | consumed samples: 39040 | elapsed time per iteration (ms): 13667.6 | learning rate: 1.082E-05 | global batch size: 16 | lm loss: 6.668837E+00 | loss scale: 16384.0 | grad norm: 69995.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2441/ 159576 | consumed samples: 39056 | elapsed time per iteration (ms): 13617.8 | learning rate: 1.082E-05 | global batch size: 16 | lm loss: 6.781438E+00 | loss scale: 16384.0 | grad norm: 49304.992 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2442/ 159576 | consumed samples: 39072 | elapsed time per iteration (ms): 13652.0 | learning rate: 1.083E-05 | global batch size: 16 | lm loss: 6.810652E+00 | loss scale: 16384.0 | grad norm: 86564.989 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2443/ 159576 | consumed samples: 39088 | elapsed time per iteration (ms): 14063.1 | learning rate: 1.083E-05 | global batch size: 16 | lm loss: 6.879047E+00 | loss scale: 16384.0 | grad norm: 56659.131 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2444/ 159576 | consumed samples: 39104 | elapsed time per iteration (ms): 13586.9 | learning rate: 1.084E-05 | global batch size: 16 | lm loss: 6.494076E+00 | loss scale: 16384.0 | grad norm: 72585.008 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2445/ 159576 | consumed samples: 39120 | elapsed time per iteration (ms): 13676.6 | learning rate: 1.084E-05 | global batch size: 16 | lm loss: 6.713490E+00 | loss scale: 16384.0 | grad norm: 68348.114 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2446/ 159576 | consumed samples: 39136 | elapsed time per iteration (ms): 13706.8 | learning rate: 1.085E-05 | global batch size: 16 | lm loss: 6.970970E+00 | loss scale: 16384.0 | grad norm: 145461.809 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2447/ 159576 | consumed samples: 39152 | elapsed time per iteration (ms): 13581.7 | learning rate: 1.085E-05 | global batch size: 16 | lm loss: 6.777845E+00 | loss scale: 16384.0 | grad norm: 67935.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2448/ 159576 | consumed samples: 39168 | elapsed time per iteration (ms): 13810.2 | learning rate: 1.086E-05 | global batch size: 16 | lm loss: 6.772415E+00 | loss scale: 16384.0 | grad norm: 86835.992 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2449/ 159576 | consumed samples: 39184 | elapsed time per iteration (ms): 13641.6 | learning rate: 1.086E-05 | global batch size: 16 | lm loss: 6.901608E+00 | loss scale: 16384.0 | grad norm: 86381.928 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2450/ 159576 | consumed samples: 39200 | elapsed time per iteration (ms): 13577.4 | learning rate: 1.086E-05 | global batch size: 16 | lm loss: 6.923601E+00 | loss scale: 16384.0 | grad norm: 67065.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2451/ 159576 | consumed samples: 39216 | elapsed time per iteration (ms): 13656.8 | learning rate: 1.087E-05 | global batch size: 16 | lm loss: 6.635858E+00 | loss scale: 16384.0 | grad norm: 118766.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2452/ 159576 | consumed samples: 39232 | elapsed time per iteration (ms): 14182.2 | learning rate: 1.087E-05 | global batch size: 16 | lm loss: 6.798747E+00 | loss scale: 16384.0 | grad norm: 86778.590 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2453/ 159576 | consumed samples: 39248 | elapsed time per iteration (ms): 13794.7 | learning rate: 1.088E-05 | global batch size: 16 | lm loss: 6.934669E+00 | loss scale: 16384.0 | grad norm: 72867.559 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2454/ 159576 | consumed samples: 39264 | elapsed time per iteration (ms): 13649.1 | learning rate: 1.088E-05 | global batch size: 16 | lm loss: 6.689157E+00 | loss scale: 16384.0 | grad norm: 53809.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2455/ 159576 | consumed samples: 39280 | elapsed time per iteration (ms): 13619.0 | learning rate: 1.089E-05 | global batch size: 16 | lm loss: 6.797565E+00 | loss scale: 16384.0 | grad norm: 130277.119 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2456/ 159576 | consumed samples: 39296 | elapsed time per iteration (ms): 14036.7 | learning rate: 1.089E-05 | global batch size: 16 | lm loss: 6.919378E+00 | loss scale: 16384.0 | grad norm: 68731.938 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2457/ 159576 | consumed samples: 39312 | elapsed time per iteration (ms): 13656.3 | learning rate: 1.089E-05 | global batch size: 16 | lm loss: 6.658165E+00 | loss scale: 16384.0 | grad norm: 90782.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2458/ 159576 | consumed samples: 39328 | elapsed time per iteration (ms): 13635.5 | learning rate: 1.090E-05 | global batch size: 16 | lm loss: 6.614546E+00 | loss scale: 16384.0 | grad norm: 80319.945 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2459/ 159576 | consumed samples: 39344 | elapsed time per iteration (ms): 13648.3 | learning rate: 1.090E-05 | global batch size: 16 | lm loss: 6.813863E+00 | loss scale: 16384.0 | grad norm: 96291.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2460/ 159576 | consumed samples: 39360 | elapsed time per iteration (ms): 13655.8 | learning rate: 1.091E-05 | global batch size: 16 | lm loss: 7.162710E+00 | loss scale: 16384.0 | grad norm: 58863.008 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2461/ 159576 | consumed samples: 39376 | elapsed time per iteration (ms): 13960.2 | learning rate: 1.091E-05 | global batch size: 16 | lm loss: 6.991768E+00 | loss scale: 16384.0 | grad norm: 72538.165 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2462/ 159576 | consumed samples: 39392 | elapsed time per iteration (ms): 13649.7 | learning rate: 1.092E-05 | global batch size: 16 | lm loss: 6.712080E+00 | loss scale: 16384.0 | grad norm: 76061.911 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2463/ 159576 | consumed samples: 39408 | elapsed time per iteration (ms): 13665.9 | learning rate: 1.092E-05 | global batch size: 16 | lm loss: 6.697587E+00 | loss scale: 16384.0 | grad norm: 78444.184 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2464/ 159576 | consumed samples: 39424 | elapsed time per iteration (ms): 13548.3 | learning rate: 1.093E-05 | global batch size: 16 | lm loss: 6.767040E+00 | loss scale: 16384.0 | grad norm: 71114.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2465/ 159576 | consumed samples: 39440 | elapsed time per iteration (ms): 13972.6 | learning rate: 1.093E-05 | global batch size: 16 | lm loss: 6.750882E+00 | loss scale: 16384.0 | grad norm: 60498.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2466/ 159576 | consumed samples: 39456 | elapsed time per iteration (ms): 13657.9 | learning rate: 1.093E-05 | global batch size: 16 | lm loss: 6.631062E+00 | loss scale: 16384.0 | grad norm: 75019.075 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2467/ 159576 | consumed samples: 39472 | elapsed time per iteration (ms): 13692.3 | learning rate: 1.094E-05 | global batch size: 16 | lm loss: 6.725332E+00 | loss scale: 16384.0 | grad norm: 53922.114 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2468/ 159576 | consumed samples: 39488 | elapsed time per iteration (ms): 13656.1 | learning rate: 1.094E-05 | global batch size: 16 | lm loss: 6.736504E+00 | loss scale: 16384.0 | grad norm: 54250.719 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2469/ 159576 | consumed samples: 39504 | elapsed time per iteration (ms): 14009.1 | learning rate: 1.095E-05 | global batch size: 16 | lm loss: 6.881338E+00 | loss scale: 16384.0 | grad norm: 64641.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2470/ 159576 | consumed samples: 39520 | elapsed time per iteration (ms): 13853.1 | learning rate: 1.095E-05 | global batch size: 16 | lm loss: 6.742140E+00 | loss scale: 16384.0 | grad norm: 52195.808 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2471/ 159576 | consumed samples: 39536 | elapsed time per iteration (ms): 13541.2 | learning rate: 1.096E-05 | global batch size: 16 | lm loss: 6.830609E+00 | loss scale: 16384.0 | grad norm: 98883.799 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2472/ 159576 | consumed samples: 39552 | elapsed time per iteration (ms): 13618.7 | learning rate: 1.096E-05 | global batch size: 16 | lm loss: 6.770423E+00 | loss scale: 16384.0 | grad norm: 66896.725 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2473/ 159576 | consumed samples: 39568 | elapsed time per iteration (ms): 13623.5 | learning rate: 1.097E-05 | global batch size: 16 | lm loss: 6.926878E+00 | loss scale: 16384.0 | grad norm: 74406.160 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2474/ 159576 | consumed samples: 39584 | elapsed time per iteration (ms): 14089.9 | learning rate: 1.097E-05 | global batch size: 16 | lm loss: 6.834147E+00 | loss scale: 16384.0 | grad norm: 61442.184 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2475/ 159576 | consumed samples: 39600 | elapsed time per iteration (ms): 13713.9 | learning rate: 1.097E-05 | global batch size: 16 | lm loss: 6.711390E+00 | loss scale: 16384.0 | grad norm: 72993.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2476/ 159576 | consumed samples: 39616 | elapsed time per iteration (ms): 13666.0 | learning rate: 1.098E-05 | global batch size: 16 | lm loss: 6.715760E+00 | loss scale: 16384.0 | grad norm: 54753.919 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2477/ 159576 | consumed samples: 39632 | elapsed time per iteration (ms): 13628.3 | learning rate: 1.098E-05 | global batch size: 16 | lm loss: 7.034068E+00 | loss scale: 16384.0 | grad norm: 65362.654 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2478/ 159576 | consumed samples: 39648 | elapsed time per iteration (ms): 14016.3 | learning rate: 1.099E-05 | global batch size: 16 | lm loss: 6.848239E+00 | loss scale: 16384.0 | grad norm: 59886.005 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2479/ 159576 | consumed samples: 39664 | elapsed time per iteration (ms): 13518.2 | learning rate: 1.099E-05 | global batch size: 16 | lm loss: 6.766425E+00 | loss scale: 32768.0 | grad norm: 47600.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2480/ 159576 | consumed samples: 39680 | elapsed time per iteration (ms): 13611.4 | learning rate: 1.100E-05 | global batch size: 16 | lm loss: 6.569361E+00 | loss scale: 32768.0 | grad norm: 173183.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2481/ 159576 | consumed samples: 39696 | elapsed time per iteration (ms): 13649.6 | learning rate: 1.100E-05 | global batch size: 16 | lm loss: 6.977244E+00 | loss scale: 32768.0 | grad norm: 114608.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2482/ 159576 | consumed samples: 39712 | elapsed time per iteration (ms): 13592.7 | learning rate: 1.101E-05 | global batch size: 16 | lm loss: 6.743002E+00 | loss scale: 32768.0 | grad norm: 157122.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2483/ 159576 | consumed samples: 39728 | elapsed time per iteration (ms): 13957.3 | learning rate: 1.101E-05 | global batch size: 16 | lm loss: 6.786878E+00 | loss scale: 32768.0 | grad norm: 124608.544 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2484/ 159576 | consumed samples: 39744 | elapsed time per iteration (ms): 13654.6 | learning rate: 1.101E-05 | global batch size: 16 | lm loss: 6.859965E+00 | loss scale: 32768.0 | grad norm: 232222.713 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2485/ 159576 | consumed samples: 39760 | elapsed time per iteration (ms): 13613.9 | learning rate: 1.102E-05 | global batch size: 16 | lm loss: 6.802356E+00 | loss scale: 32768.0 | grad norm: 156829.946 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2486/ 159576 | consumed samples: 39776 | elapsed time per iteration (ms): 13653.4 | learning rate: 1.102E-05 | global batch size: 16 | lm loss: 6.710648E+00 | loss scale: 32768.0 | grad norm: 134523.046 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2487/ 159576 | consumed samples: 39792 | elapsed time per iteration (ms): 14072.7 | learning rate: 1.103E-05 | global batch size: 16 | lm loss: 6.797608E+00 | loss scale: 32768.0 | grad norm: 125011.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2488/ 159576 | consumed samples: 39808 | elapsed time per iteration (ms): 13639.9 | learning rate: 1.103E-05 | global batch size: 16 | lm loss: 6.854223E+00 | loss scale: 32768.0 | grad norm: 260551.098 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2489/ 159576 | consumed samples: 39824 | elapsed time per iteration (ms): 13577.6 | learning rate: 1.104E-05 | global batch size: 16 | lm loss: 6.603992E+00 | loss scale: 32768.0 | grad norm: 181893.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2490/ 159576 | consumed samples: 39840 | elapsed time per iteration (ms): 13675.7 | learning rate: 1.104E-05 | global batch size: 16 | lm loss: 6.694830E+00 | loss scale: 32768.0 | grad norm: 141757.675 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2491/ 159576 | consumed samples: 39856 | elapsed time per iteration (ms): 14083.9 | learning rate: 1.105E-05 | global batch size: 16 | lm loss: 6.642892E+00 | loss scale: 32768.0 | grad norm: 119287.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2492/ 159576 | consumed samples: 39872 | elapsed time per iteration (ms): 13603.6 | learning rate: 1.105E-05 | global batch size: 16 | lm loss: 6.801910E+00 | loss scale: 32768.0 | grad norm: 155539.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2493/ 159576 | consumed samples: 39888 | elapsed time per iteration (ms): 13598.7 | learning rate: 1.105E-05 | global batch size: 16 | lm loss: 6.791874E+00 | loss scale: 32768.0 | grad norm: 122407.998 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2494/ 159576 | consumed samples: 39904 | elapsed time per iteration (ms): 13643.8 | learning rate: 1.106E-05 | global batch size: 16 | lm loss: 6.826643E+00 | loss scale: 32768.0 | grad norm: 128586.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2495/ 159576 | consumed samples: 39920 | elapsed time per iteration (ms): 13584.0 | learning rate: 1.106E-05 | global batch size: 16 | lm loss: 6.715306E+00 | loss scale: 32768.0 | grad norm: 99484.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2496/ 159576 | consumed samples: 39936 | elapsed time per iteration (ms): 13754.1 | learning rate: 1.107E-05 | global batch size: 16 | lm loss: 6.833625E+00 | loss scale: 32768.0 | grad norm: 115202.668 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2497/ 159576 | consumed samples: 39952 | elapsed time per iteration (ms): 13634.3 | learning rate: 1.107E-05 | global batch size: 16 | lm loss: 6.915625E+00 | loss scale: 32768.0 | grad norm: 186838.919 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2498/ 159576 | consumed samples: 39968 | elapsed time per iteration (ms): 13644.0 | learning rate: 1.108E-05 | global batch size: 16 | lm loss: 6.967087E+00 | loss scale: 32768.0 | grad norm: 131122.134 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2499/ 159576 | consumed samples: 39984 | elapsed time per iteration (ms): 13681.7 | learning rate: 1.108E-05 | global batch size: 16 | lm loss: 6.760918E+00 | loss scale: 32768.0 | grad norm: 194624.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2500/ 159576 | consumed samples: 40000 | elapsed time per iteration (ms): 14007.6 | learning rate: 1.109E-05 | global batch size: 16 | lm loss: 6.979738E+00 | loss scale: 32768.0 | grad norm: 156689.771 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2501/ 159576 | consumed samples: 40016 | elapsed time per iteration (ms): 13617.5 | learning rate: 1.109E-05 | global batch size: 16 | lm loss: 6.789479E+00 | loss scale: 32768.0 | grad norm: 144780.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2502/ 159576 | consumed samples: 40032 | elapsed time per iteration (ms): 13599.5 | learning rate: 1.109E-05 | global batch size: 16 | lm loss: 6.864005E+00 | loss scale: 32768.0 | grad norm: 170229.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2503/ 159576 | consumed samples: 40048 | elapsed time per iteration (ms): 13573.2 | learning rate: 1.110E-05 | global batch size: 16 | lm loss: 6.666573E+00 | loss scale: 32768.0 | grad norm: 146264.627 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2504/ 159576 | consumed samples: 40064 | elapsed time per iteration (ms): 13981.7 | learning rate: 1.110E-05 | global batch size: 16 | lm loss: 6.757555E+00 | loss scale: 32768.0 | grad norm: 194432.846 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2505/ 159576 | consumed samples: 40080 | elapsed time per iteration (ms): 13815.5 | learning rate: 1.111E-05 | global batch size: 16 | lm loss: 7.060199E+00 | loss scale: 32768.0 | grad norm: 107664.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2506/ 159576 | consumed samples: 40096 | elapsed time per iteration (ms): 13708.3 | learning rate: 1.111E-05 | global batch size: 16 | lm loss: 6.757818E+00 | loss scale: 32768.0 | grad norm: 172391.067 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2507/ 159576 | consumed samples: 40112 | elapsed time per iteration (ms): 13682.1 | learning rate: 1.112E-05 | global batch size: 16 | lm loss: 6.957751E+00 | loss scale: 32768.0 | grad norm: 153732.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2508/ 159576 | consumed samples: 40128 | elapsed time per iteration (ms): 13651.8 | learning rate: 1.112E-05 | global batch size: 16 | lm loss: 6.697278E+00 | loss scale: 32768.0 | grad norm: 269873.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2509/ 159576 | consumed samples: 40144 | elapsed time per iteration (ms): 13847.8 | learning rate: 1.113E-05 | global batch size: 16 | lm loss: 6.915687E+00 | loss scale: 32768.0 | grad norm: 203672.027 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2510/ 159576 | consumed samples: 40160 | elapsed time per iteration (ms): 13726.7 | learning rate: 1.113E-05 | global batch size: 16 | lm loss: 6.563999E+00 | loss scale: 32768.0 | grad norm: 156793.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2511/ 159576 | consumed samples: 40176 | elapsed time per iteration (ms): 13592.8 | learning rate: 1.113E-05 | global batch size: 16 | lm loss: 6.816392E+00 | loss scale: 32768.0 | grad norm: 174319.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2512/ 159576 | consumed samples: 40192 | elapsed time per iteration (ms): 13663.1 | learning rate: 1.114E-05 | global batch size: 16 | lm loss: 6.610006E+00 | loss scale: 32768.0 | grad norm: 205941.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2513/ 159576 | consumed samples: 40208 | elapsed time per iteration (ms): 13997.4 | learning rate: 1.114E-05 | global batch size: 16 | lm loss: 6.968318E+00 | loss scale: 32768.0 | grad norm: 198426.978 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2514/ 159576 | consumed samples: 40224 | elapsed time per iteration (ms): 13639.5 | learning rate: 1.115E-05 | global batch size: 16 | lm loss: 6.754237E+00 | loss scale: 32768.0 | grad norm: 150994.686 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2515/ 159576 | consumed samples: 40240 | elapsed time per iteration (ms): 13721.6 | learning rate: 1.115E-05 | global batch size: 16 | lm loss: 6.780080E+00 | loss scale: 32768.0 | grad norm: 221933.544 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2516/ 159576 | consumed samples: 40256 | elapsed time per iteration (ms): 13588.8 | learning rate: 1.116E-05 | global batch size: 16 | lm loss: 7.005465E+00 | loss scale: 32768.0 | grad norm: 111981.898 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2517/ 159576 | consumed samples: 40272 | elapsed time per iteration (ms): 13636.9 | learning rate: 1.116E-05 | global batch size: 16 | lm loss: 7.038844E+00 | loss scale: 32768.0 | grad norm: 207331.802 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2518/ 159576 | consumed samples: 40288 | elapsed time per iteration (ms): 13872.4 | learning rate: 1.117E-05 | global batch size: 16 | lm loss: 6.753989E+00 | loss scale: 32768.0 | grad norm: 152725.941 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2519/ 159576 | consumed samples: 40304 | elapsed time per iteration (ms): 13607.9 | learning rate: 1.117E-05 | global batch size: 16 | lm loss: 6.981558E+00 | loss scale: 32768.0 | grad norm: 154949.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2520/ 159576 | consumed samples: 40320 | elapsed time per iteration (ms): 13684.9 | learning rate: 1.117E-05 | global batch size: 16 | lm loss: 6.906241E+00 | loss scale: 32768.0 | grad norm: 125549.575 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2521/ 159576 | consumed samples: 40336 | elapsed time per iteration (ms): 13716.2 | learning rate: 1.118E-05 | global batch size: 16 | lm loss: 6.747027E+00 | loss scale: 32768.0 | grad norm: 122780.845 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2522/ 159576 | consumed samples: 40352 | elapsed time per iteration (ms): 14167.1 | learning rate: 1.118E-05 | global batch size: 16 | lm loss: 6.970352E+00 | loss scale: 32768.0 | grad norm: 118819.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2523/ 159576 | consumed samples: 40368 | elapsed time per iteration (ms): 13664.4 | learning rate: 1.119E-05 | global batch size: 16 | lm loss: 6.714174E+00 | loss scale: 32768.0 | grad norm: 146027.986 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2524/ 159576 | consumed samples: 40384 | elapsed time per iteration (ms): 13630.7 | learning rate: 1.119E-05 | global batch size: 16 | lm loss: 6.610335E+00 | loss scale: 32768.0 | grad norm: 242081.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2525/ 159576 | consumed samples: 40400 | elapsed time per iteration (ms): 13685.5 | learning rate: 1.120E-05 | global batch size: 16 | lm loss: 6.889633E+00 | loss scale: 32768.0 | grad norm: 125371.781 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2526/ 159576 | consumed samples: 40416 | elapsed time per iteration (ms): 13989.6 | learning rate: 1.120E-05 | global batch size: 16 | lm loss: 6.703308E+00 | loss scale: 32768.0 | grad norm: 229244.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2527/ 159576 | consumed samples: 40432 | elapsed time per iteration (ms): 13653.7 | learning rate: 1.121E-05 | global batch size: 16 | lm loss: 6.903625E+00 | loss scale: 32768.0 | grad norm: 180615.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2528/ 159576 | consumed samples: 40448 | elapsed time per iteration (ms): 13688.8 | learning rate: 1.121E-05 | global batch size: 16 | lm loss: 6.882591E+00 | loss scale: 32768.0 | grad norm: 123446.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2529/ 159576 | consumed samples: 40464 | elapsed time per iteration (ms): 13727.9 | learning rate: 1.121E-05 | global batch size: 16 | lm loss: 6.771068E+00 | loss scale: 32768.0 | grad norm: 136122.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2530/ 159576 | consumed samples: 40480 | elapsed time per iteration (ms): 13727.3 | learning rate: 1.122E-05 | global batch size: 16 | lm loss: 6.839997E+00 | loss scale: 32768.0 | grad norm: 198759.749 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2531/ 159576 | consumed samples: 40496 | elapsed time per iteration (ms): 13882.2 | learning rate: 1.122E-05 | global batch size: 16 | lm loss: 6.934726E+00 | loss scale: 32768.0 | grad norm: 140393.181 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2532/ 159576 | consumed samples: 40512 | elapsed time per iteration (ms): 13707.7 | learning rate: 1.123E-05 | global batch size: 16 | lm loss: 6.824786E+00 | loss scale: 32768.0 | grad norm: 136497.509 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2533/ 159576 | consumed samples: 40528 | elapsed time per iteration (ms): 13668.7 | learning rate: 1.123E-05 | global batch size: 16 | lm loss: 6.638996E+00 | loss scale: 32768.0 | grad norm: 108086.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2534/ 159576 | consumed samples: 40544 | elapsed time per iteration (ms): 13600.7 | learning rate: 1.124E-05 | global batch size: 16 | lm loss: 6.684957E+00 | loss scale: 32768.0 | grad norm: 136205.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2535/ 159576 | consumed samples: 40560 | elapsed time per iteration (ms): 14008.2 | learning rate: 1.124E-05 | global batch size: 16 | lm loss: 6.650595E+00 | loss scale: 32768.0 | grad norm: 89458.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2536/ 159576 | consumed samples: 40576 | elapsed time per iteration (ms): 13696.2 | learning rate: 1.125E-05 | global batch size: 16 | lm loss: 6.720654E+00 | loss scale: 32768.0 | grad norm: 207949.897 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2537/ 159576 | consumed samples: 40592 | elapsed time per iteration (ms): 13728.0 | learning rate: 1.125E-05 | global batch size: 16 | lm loss: 6.934484E+00 | loss scale: 32768.0 | grad norm: 145165.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2538/ 159576 | consumed samples: 40608 | elapsed time per iteration (ms): 13707.3 | learning rate: 1.125E-05 | global batch size: 16 | lm loss: 6.659933E+00 | loss scale: 32768.0 | grad norm: 109227.116 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2539/ 159576 | consumed samples: 40624 | elapsed time per iteration (ms): 14115.0 | learning rate: 1.126E-05 | global batch size: 16 | lm loss: 6.638377E+00 | loss scale: 32768.0 | grad norm: 221623.574 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2540/ 159576 | consumed samples: 40640 | elapsed time per iteration (ms): 13557.7 | learning rate: 1.126E-05 | global batch size: 16 | lm loss: 6.825821E+00 | loss scale: 32768.0 | grad norm: 114656.887 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2541/ 159576 | consumed samples: 40656 | elapsed time per iteration (ms): 13635.6 | learning rate: 1.127E-05 | global batch size: 16 | lm loss: 6.869952E+00 | loss scale: 32768.0 | grad norm: 204975.764 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2542/ 159576 | consumed samples: 40672 | elapsed time per iteration (ms): 13682.2 | learning rate: 1.127E-05 | global batch size: 16 | lm loss: 6.829473E+00 | loss scale: 32768.0 | grad norm: 158875.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2543/ 159576 | consumed samples: 40688 | elapsed time per iteration (ms): 13675.9 | learning rate: 1.128E-05 | global batch size: 16 | lm loss: 6.921135E+00 | loss scale: 32768.0 | grad norm: 248424.787 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2544/ 159576 | consumed samples: 40704 | elapsed time per iteration (ms): 14035.2 | learning rate: 1.128E-05 | global batch size: 16 | lm loss: 6.734321E+00 | loss scale: 32768.0 | grad norm: 137358.902 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2545/ 159576 | consumed samples: 40720 | elapsed time per iteration (ms): 13685.4 | learning rate: 1.129E-05 | global batch size: 16 | lm loss: 6.824071E+00 | loss scale: 32768.0 | grad norm: 172473.743 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2546/ 159576 | consumed samples: 40736 | elapsed time per iteration (ms): 13704.2 | learning rate: 1.129E-05 | global batch size: 16 | lm loss: 6.741428E+00 | loss scale: 32768.0 | grad norm: 117821.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2547/ 159576 | consumed samples: 40752 | elapsed time per iteration (ms): 13625.1 | learning rate: 1.129E-05 | global batch size: 16 | lm loss: 6.825446E+00 | loss scale: 32768.0 | grad norm: 302813.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2548/ 159576 | consumed samples: 40768 | elapsed time per iteration (ms): 13978.9 | learning rate: 1.130E-05 | global batch size: 16 | lm loss: 6.930991E+00 | loss scale: 32768.0 | grad norm: 163222.779 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2549/ 159576 | consumed samples: 40784 | elapsed time per iteration (ms): 13605.2 | learning rate: 1.130E-05 | global batch size: 16 | lm loss: 6.901045E+00 | loss scale: 32768.0 | grad norm: 178776.030 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2550/ 159576 | consumed samples: 40800 | elapsed time per iteration (ms): 13704.5 | learning rate: 1.131E-05 | global batch size: 16 | lm loss: 6.923467E+00 | loss scale: 32768.0 | grad norm: 156500.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2551/ 159576 | consumed samples: 40816 | elapsed time per iteration (ms): 13642.0 | learning rate: 1.131E-05 | global batch size: 16 | lm loss: 6.698053E+00 | loss scale: 32768.0 | grad norm: 142885.953 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2552/ 159576 | consumed samples: 40832 | elapsed time per iteration (ms): 13988.3 | learning rate: 1.132E-05 | global batch size: 16 | lm loss: 6.774540E+00 | loss scale: 32768.0 | grad norm: 236886.022 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2553/ 159576 | consumed samples: 40848 | elapsed time per iteration (ms): 13862.8 | learning rate: 1.132E-05 | global batch size: 16 | lm loss: 6.706432E+00 | loss scale: 32768.0 | grad norm: 178546.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2554/ 159576 | consumed samples: 40864 | elapsed time per iteration (ms): 13629.3 | learning rate: 1.133E-05 | global batch size: 16 | lm loss: 6.631795E+00 | loss scale: 32768.0 | grad norm: 176739.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2555/ 159576 | consumed samples: 40880 | elapsed time per iteration (ms): 13608.3 | learning rate: 1.133E-05 | global batch size: 16 | lm loss: 7.180985E+00 | loss scale: 32768.0 | grad norm: 132584.462 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2556/ 159576 | consumed samples: 40896 | elapsed time per iteration (ms): 13580.0 | learning rate: 1.133E-05 | global batch size: 16 | lm loss: 6.838911E+00 | loss scale: 32768.0 | grad norm: 90158.811 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2557/ 159576 | consumed samples: 40912 | elapsed time per iteration (ms): 13942.7 | learning rate: 1.134E-05 | global batch size: 16 | lm loss: 6.693833E+00 | loss scale: 32768.0 | grad norm: 220674.059 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2558/ 159576 | consumed samples: 40928 | elapsed time per iteration (ms): 13802.7 | learning rate: 1.134E-05 | global batch size: 16 | lm loss: 6.568502E+00 | loss scale: 32768.0 | grad norm: 98298.873 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2559/ 159576 | consumed samples: 40944 | elapsed time per iteration (ms): 13641.3 | learning rate: 1.135E-05 | global batch size: 16 | lm loss: 6.635581E+00 | loss scale: 32768.0 | grad norm: 169974.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2560/ 159576 | consumed samples: 40960 | elapsed time per iteration (ms): 13704.3 | learning rate: 1.135E-05 | global batch size: 16 | lm loss: 6.565581E+00 | loss scale: 32768.0 | grad norm: 129387.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2561/ 159576 | consumed samples: 40976 | elapsed time per iteration (ms): 14001.7 | learning rate: 1.136E-05 | global batch size: 16 | lm loss: 6.892058E+00 | loss scale: 32768.0 | grad norm: 339367.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2562/ 159576 | consumed samples: 40992 | elapsed time per iteration (ms): 13513.6 | learning rate: 1.136E-05 | global batch size: 16 | lm loss: 6.762362E+00 | loss scale: 32768.0 | grad norm: 232794.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2563/ 159576 | consumed samples: 41008 | elapsed time per iteration (ms): 13601.0 | learning rate: 1.137E-05 | global batch size: 16 | lm loss: 6.843441E+00 | loss scale: 32768.0 | grad norm: 163664.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2564/ 159576 | consumed samples: 41024 | elapsed time per iteration (ms): 13594.8 | learning rate: 1.137E-05 | global batch size: 16 | lm loss: 6.819015E+00 | loss scale: 32768.0 | grad norm: 216339.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2565/ 159576 | consumed samples: 41040 | elapsed time per iteration (ms): 13605.6 | learning rate: 1.137E-05 | global batch size: 16 | lm loss: 6.897832E+00 | loss scale: 32768.0 | grad norm: 109607.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2566/ 159576 | consumed samples: 41056 | elapsed time per iteration (ms): 13861.7 | learning rate: 1.138E-05 | global batch size: 16 | lm loss: 6.875384E+00 | loss scale: 32768.0 | grad norm: 402667.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2567/ 159576 | consumed samples: 41072 | elapsed time per iteration (ms): 13673.7 | learning rate: 1.138E-05 | global batch size: 16 | lm loss: 6.761830E+00 | loss scale: 32768.0 | grad norm: 270664.955 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2568/ 159576 | consumed samples: 41088 | elapsed time per iteration (ms): 13672.1 | learning rate: 1.139E-05 | global batch size: 16 | lm loss: 6.792128E+00 | loss scale: 32768.0 | grad norm: 176693.891 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2569/ 159576 | consumed samples: 41104 | elapsed time per iteration (ms): 13609.9 | learning rate: 1.139E-05 | global batch size: 16 | lm loss: 6.935215E+00 | loss scale: 32768.0 | grad norm: 182679.180 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 12:06:15] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 12:06:15] PULSE: tr8-104B is running for 6:14:04 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 2570/ 159576 | consumed samples: 41120 | elapsed time per iteration (ms): 14149.7 | learning rate: 1.140E-05 | global batch size: 16 | lm loss: 6.826759E+00 | loss scale: 32768.0 | grad norm: 135711.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2571/ 159576 | consumed samples: 41136 | elapsed time per iteration (ms): 13749.2 | learning rate: 1.140E-05 | global batch size: 16 | lm loss: 6.600703E+00 | loss scale: 32768.0 | grad norm: 143461.893 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2572/ 159576 | consumed samples: 41152 | elapsed time per iteration (ms): 13601.5 | learning rate: 1.141E-05 | global batch size: 16 | lm loss: 6.747102E+00 | loss scale: 32768.0 | grad norm: 205480.052 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2573/ 159576 | consumed samples: 41168 | elapsed time per iteration (ms): 13680.7 | learning rate: 1.141E-05 | global batch size: 16 | lm loss: 6.767237E+00 | loss scale: 32768.0 | grad norm: 186807.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2574/ 159576 | consumed samples: 41184 | elapsed time per iteration (ms): 14103.7 | learning rate: 1.141E-05 | global batch size: 16 | lm loss: 6.786840E+00 | loss scale: 32768.0 | grad norm: 125986.096 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2575/ 159576 | consumed samples: 41200 | elapsed time per iteration (ms): 13634.6 | learning rate: 1.142E-05 | global batch size: 16 | lm loss: 6.740016E+00 | loss scale: 32768.0 | grad norm: 127578.945 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2576/ 159576 | consumed samples: 41216 | elapsed time per iteration (ms): 13632.4 | learning rate: 1.142E-05 | global batch size: 16 | lm loss: 6.717787E+00 | loss scale: 32768.0 | grad norm: 91352.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2577/ 159576 | consumed samples: 41232 | elapsed time per iteration (ms): 13613.7 | learning rate: 1.143E-05 | global batch size: 16 | lm loss: 6.736307E+00 | loss scale: 32768.0 | grad norm: 161126.891 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2578/ 159576 | consumed samples: 41248 | elapsed time per iteration (ms): 13501.7 | learning rate: 1.143E-05 | global batch size: 16 | lm loss: 6.725785E+00 | loss scale: 32768.0 | grad norm: 105065.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2579/ 159576 | consumed samples: 41264 | elapsed time per iteration (ms): 13746.0 | learning rate: 1.144E-05 | global batch size: 16 | lm loss: 6.731723E+00 | loss scale: 32768.0 | grad norm: 123413.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2580/ 159576 | consumed samples: 41280 | elapsed time per iteration (ms): 13621.8 | learning rate: 1.144E-05 | global batch size: 16 | lm loss: 6.889888E+00 | loss scale: 32768.0 | grad norm: 128934.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2581/ 159576 | consumed samples: 41296 | elapsed time per iteration (ms): 13634.3 | learning rate: 1.145E-05 | global batch size: 16 | lm loss: 6.845993E+00 | loss scale: 32768.0 | grad norm: 140353.622 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2582/ 159576 | consumed samples: 41312 | elapsed time per iteration (ms): 13645.1 | learning rate: 1.145E-05 | global batch size: 16 | lm loss: 6.922751E+00 | loss scale: 32768.0 | grad norm: 193649.510 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2583/ 159576 | consumed samples: 41328 | elapsed time per iteration (ms): 14012.6 | learning rate: 1.145E-05 | global batch size: 16 | lm loss: 6.706060E+00 | loss scale: 32768.0 | grad norm: 120536.730 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2584/ 159576 | consumed samples: 41344 | elapsed time per iteration (ms): 13567.7 | learning rate: 1.146E-05 | global batch size: 16 | lm loss: 6.729124E+00 | loss scale: 32768.0 | grad norm: 150036.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2585/ 159576 | consumed samples: 41360 | elapsed time per iteration (ms): 13534.2 | learning rate: 1.146E-05 | global batch size: 16 | lm loss: 6.841982E+00 | loss scale: 32768.0 | grad norm: 169788.083 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2586/ 159576 | consumed samples: 41376 | elapsed time per iteration (ms): 13556.0 | learning rate: 1.147E-05 | global batch size: 16 | lm loss: 6.813578E+00 | loss scale: 32768.0 | grad norm: 120615.854 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2587/ 159576 | consumed samples: 41392 | elapsed time per iteration (ms): 13668.2 | learning rate: 1.147E-05 | global batch size: 16 | lm loss: 6.675393E+00 | loss scale: 32768.0 | grad norm: 202372.780 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2588/ 159576 | consumed samples: 41408 | elapsed time per iteration (ms): 13867.2 | learning rate: 1.148E-05 | global batch size: 16 | lm loss: 6.796386E+00 | loss scale: 32768.0 | grad norm: 131901.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2589/ 159576 | consumed samples: 41424 | elapsed time per iteration (ms): 13636.7 | learning rate: 1.148E-05 | global batch size: 16 | lm loss: 6.783171E+00 | loss scale: 32768.0 | grad norm: 127655.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2590/ 159576 | consumed samples: 41440 | elapsed time per iteration (ms): 13677.9 | learning rate: 1.149E-05 | global batch size: 16 | lm loss: 6.672108E+00 | loss scale: 32768.0 | grad norm: 111803.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2591/ 159576 | consumed samples: 41456 | elapsed time per iteration (ms): 13670.0 | learning rate: 1.149E-05 | global batch size: 16 | lm loss: 6.894643E+00 | loss scale: 32768.0 | grad norm: 156503.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2592/ 159576 | consumed samples: 41472 | elapsed time per iteration (ms): 14137.5 | learning rate: 1.149E-05 | global batch size: 16 | lm loss: 6.765024E+00 | loss scale: 32768.0 | grad norm: 160594.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2593/ 159576 | consumed samples: 41488 | elapsed time per iteration (ms): 13635.7 | learning rate: 1.150E-05 | global batch size: 16 | lm loss: 6.882227E+00 | loss scale: 32768.0 | grad norm: 142008.845 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2594/ 159576 | consumed samples: 41504 | elapsed time per iteration (ms): 13592.8 | learning rate: 1.150E-05 | global batch size: 16 | lm loss: 6.750668E+00 | loss scale: 32768.0 | grad norm: 137376.665 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2595/ 159576 | consumed samples: 41520 | elapsed time per iteration (ms): 13572.7 | learning rate: 1.151E-05 | global batch size: 16 | lm loss: 6.870511E+00 | loss scale: 32768.0 | grad norm: 203139.065 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2596/ 159576 | consumed samples: 41536 | elapsed time per iteration (ms): 13955.3 | learning rate: 1.151E-05 | global batch size: 16 | lm loss: 6.952578E+00 | loss scale: 32768.0 | grad norm: 259660.982 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2597/ 159576 | consumed samples: 41552 | elapsed time per iteration (ms): 13711.6 | learning rate: 1.152E-05 | global batch size: 16 | lm loss: 6.681178E+00 | loss scale: 32768.0 | grad norm: 126907.178 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2598/ 159576 | consumed samples: 41568 | elapsed time per iteration (ms): 13707.8 | learning rate: 1.152E-05 | global batch size: 16 | lm loss: 6.610268E+00 | loss scale: 32768.0 | grad norm: 135897.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2599/ 159576 | consumed samples: 41584 | elapsed time per iteration (ms): 13564.4 | learning rate: 1.153E-05 | global batch size: 16 | lm loss: 6.826151E+00 | loss scale: 32768.0 | grad norm: 155911.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2600/ 159576 | consumed samples: 41600 | elapsed time per iteration (ms): 13546.1 | learning rate: 1.153E-05 | global batch size: 16 | lm loss: 6.632576E+00 | loss scale: 32768.0 | grad norm: 252409.904 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2601/ 159576 | consumed samples: 41616 | elapsed time per iteration (ms): 13887.8 | learning rate: 1.153E-05 | global batch size: 16 | lm loss: 6.631788E+00 | loss scale: 32768.0 | grad norm: 165940.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2602/ 159576 | consumed samples: 41632 | elapsed time per iteration (ms): 13567.8 | learning rate: 1.154E-05 | global batch size: 16 | lm loss: 6.939396E+00 | loss scale: 32768.0 | grad norm: 124805.953 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2603/ 159576 | consumed samples: 41648 | elapsed time per iteration (ms): 13581.4 | learning rate: 1.154E-05 | global batch size: 16 | lm loss: 6.924129E+00 | loss scale: 32768.0 | grad norm: 133938.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2604/ 159576 | consumed samples: 41664 | elapsed time per iteration (ms): 13613.2 | learning rate: 1.155E-05 | global batch size: 16 | lm loss: 6.660190E+00 | loss scale: 32768.0 | grad norm: 188689.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2605/ 159576 | consumed samples: 41680 | elapsed time per iteration (ms): 14144.8 | learning rate: 1.155E-05 | global batch size: 16 | lm loss: 6.643148E+00 | loss scale: 32768.0 | grad norm: 123140.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2606/ 159576 | consumed samples: 41696 | elapsed time per iteration (ms): 13667.3 | learning rate: 1.156E-05 | global batch size: 16 | lm loss: 6.805959E+00 | loss scale: 32768.0 | grad norm: 196566.691 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2607/ 159576 | consumed samples: 41712 | elapsed time per iteration (ms): 13574.2 | learning rate: 1.156E-05 | global batch size: 16 | lm loss: 6.711599E+00 | loss scale: 32768.0 | grad norm: 167578.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2608/ 159576 | consumed samples: 41728 | elapsed time per iteration (ms): 13571.4 | learning rate: 1.157E-05 | global batch size: 16 | lm loss: 6.852364E+00 | loss scale: 32768.0 | grad norm: 120545.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2609/ 159576 | consumed samples: 41744 | elapsed time per iteration (ms): 13823.4 | learning rate: 1.157E-05 | global batch size: 16 | lm loss: 6.988579E+00 | loss scale: 32768.0 | grad norm: 242130.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2610/ 159576 | consumed samples: 41760 | elapsed time per iteration (ms): 13677.8 | learning rate: 1.157E-05 | global batch size: 16 | lm loss: 6.640975E+00 | loss scale: 32768.0 | grad norm: 193270.029 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2611/ 159576 | consumed samples: 41776 | elapsed time per iteration (ms): 13648.9 | learning rate: 1.158E-05 | global batch size: 16 | lm loss: 6.554218E+00 | loss scale: 32768.0 | grad norm: 132307.655 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2612/ 159576 | consumed samples: 41792 | elapsed time per iteration (ms): 13675.5 | learning rate: 1.158E-05 | global batch size: 16 | lm loss: 6.875402E+00 | loss scale: 32768.0 | grad norm: 127017.802 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2613/ 159576 | consumed samples: 41808 | elapsed time per iteration (ms): 13589.6 | learning rate: 1.159E-05 | global batch size: 16 | lm loss: 6.853450E+00 | loss scale: 32768.0 | grad norm: 271835.942 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2614/ 159576 | consumed samples: 41824 | elapsed time per iteration (ms): 13981.2 | learning rate: 1.159E-05 | global batch size: 16 | lm loss: 6.810247E+00 | loss scale: 32768.0 | grad norm: 210644.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2615/ 159576 | consumed samples: 41840 | elapsed time per iteration (ms): 13580.3 | learning rate: 1.160E-05 | global batch size: 16 | lm loss: 6.856892E+00 | loss scale: 32768.0 | grad norm: 139996.135 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2616/ 159576 | consumed samples: 41856 | elapsed time per iteration (ms): 13592.7 | learning rate: 1.160E-05 | global batch size: 16 | lm loss: 6.687234E+00 | loss scale: 32768.0 | grad norm: 130216.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2617/ 159576 | consumed samples: 41872 | elapsed time per iteration (ms): 13579.5 | learning rate: 1.161E-05 | global batch size: 16 | lm loss: 6.753475E+00 | loss scale: 32768.0 | grad norm: 270435.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2618/ 159576 | consumed samples: 41888 | elapsed time per iteration (ms): 14037.5 | learning rate: 1.161E-05 | global batch size: 16 | lm loss: 6.964073E+00 | loss scale: 32768.0 | grad norm: 185416.747 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2619/ 159576 | consumed samples: 41904 | elapsed time per iteration (ms): 13552.1 | learning rate: 1.161E-05 | global batch size: 16 | lm loss: 6.609634E+00 | loss scale: 32768.0 | grad norm: 157098.176 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2620/ 159576 | consumed samples: 41920 | elapsed time per iteration (ms): 13574.2 | learning rate: 1.162E-05 | global batch size: 16 | lm loss: 7.006974E+00 | loss scale: 32768.0 | grad norm: 140378.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2621/ 159576 | consumed samples: 41936 | elapsed time per iteration (ms): 13648.0 | learning rate: 1.162E-05 | global batch size: 16 | lm loss: 6.562167E+00 | loss scale: 32768.0 | grad norm: 169654.536 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2622/ 159576 | consumed samples: 41952 | elapsed time per iteration (ms): 13713.4 | learning rate: 1.163E-05 | global batch size: 16 | lm loss: 6.810758E+00 | loss scale: 32768.0 | grad norm: 209798.087 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2623/ 159576 | consumed samples: 41968 | elapsed time per iteration (ms): 13925.7 | learning rate: 1.163E-05 | global batch size: 16 | lm loss: 6.522465E+00 | loss scale: 32768.0 | grad norm: 119471.106 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2624/ 159576 | consumed samples: 41984 | elapsed time per iteration (ms): 13583.0 | learning rate: 1.164E-05 | global batch size: 16 | lm loss: 6.827784E+00 | loss scale: 32768.0 | grad norm: 115498.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2625/ 159576 | consumed samples: 42000 | elapsed time per iteration (ms): 13618.7 | learning rate: 1.164E-05 | global batch size: 16 | lm loss: 6.663583E+00 | loss scale: 32768.0 | grad norm: 131333.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2626/ 159576 | consumed samples: 42016 | elapsed time per iteration (ms): 13695.0 | learning rate: 1.164E-05 | global batch size: 16 | lm loss: 6.731676E+00 | loss scale: 32768.0 | grad norm: 105476.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2627/ 159576 | consumed samples: 42032 | elapsed time per iteration (ms): 14032.3 | learning rate: 1.165E-05 | global batch size: 16 | lm loss: 6.635394E+00 | loss scale: 32768.0 | grad norm: 155841.088 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2628/ 159576 | consumed samples: 42048 | elapsed time per iteration (ms): 13596.4 | learning rate: 1.165E-05 | global batch size: 16 | lm loss: 6.768427E+00 | loss scale: 32768.0 | grad norm: 91352.945 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2629/ 159576 | consumed samples: 42064 | elapsed time per iteration (ms): 13735.4 | learning rate: 1.166E-05 | global batch size: 16 | lm loss: 6.877464E+00 | loss scale: 32768.0 | grad norm: 246645.890 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2630/ 159576 | consumed samples: 42080 | elapsed time per iteration (ms): 13558.6 | learning rate: 1.166E-05 | global batch size: 16 | lm loss: 6.714092E+00 | loss scale: 32768.0 | grad norm: 131077.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2631/ 159576 | consumed samples: 42096 | elapsed time per iteration (ms): 14063.2 | learning rate: 1.167E-05 | global batch size: 16 | lm loss: 6.598214E+00 | loss scale: 32768.0 | grad norm: 142113.685 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2632/ 159576 | consumed samples: 42112 | elapsed time per iteration (ms): 13570.0 | learning rate: 1.167E-05 | global batch size: 16 | lm loss: 6.958339E+00 | loss scale: 32768.0 | grad norm: 196255.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2633/ 159576 | consumed samples: 42128 | elapsed time per iteration (ms): 13592.6 | learning rate: 1.168E-05 | global batch size: 16 | lm loss: 6.596231E+00 | loss scale: 32768.0 | grad norm: 167680.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2634/ 159576 | consumed samples: 42144 | elapsed time per iteration (ms): 13671.7 | learning rate: 1.168E-05 | global batch size: 16 | lm loss: 6.775526E+00 | loss scale: 32768.0 | grad norm: 111055.921 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2635/ 159576 | consumed samples: 42160 | elapsed time per iteration (ms): 13642.2 | learning rate: 1.168E-05 | global batch size: 16 | lm loss: 6.786438E+00 | loss scale: 32768.0 | grad norm: 146172.944 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2636/ 159576 | consumed samples: 42176 | elapsed time per iteration (ms): 14001.7 | learning rate: 1.169E-05 | global batch size: 16 | lm loss: 6.785826E+00 | loss scale: 32768.0 | grad norm: 101705.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2637/ 159576 | consumed samples: 42192 | elapsed time per iteration (ms): 13632.3 | learning rate: 1.169E-05 | global batch size: 16 | lm loss: 6.918137E+00 | loss scale: 32768.0 | grad norm: 359289.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2638/ 159576 | consumed samples: 42208 | elapsed time per iteration (ms): 13642.4 | learning rate: 1.170E-05 | global batch size: 16 | lm loss: 6.474925E+00 | loss scale: 32768.0 | grad norm: 210644.789 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2639/ 159576 | consumed samples: 42224 | elapsed time per iteration (ms): 13584.1 | learning rate: 1.170E-05 | global batch size: 16 | lm loss: 6.622705E+00 | loss scale: 32768.0 | grad norm: 159853.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2640/ 159576 | consumed samples: 42240 | elapsed time per iteration (ms): 13928.4 | learning rate: 1.171E-05 | global batch size: 16 | lm loss: 6.883276E+00 | loss scale: 32768.0 | grad norm: 134874.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2641/ 159576 | consumed samples: 42256 | elapsed time per iteration (ms): 13672.3 | learning rate: 1.171E-05 | global batch size: 16 | lm loss: 6.975843E+00 | loss scale: 32768.0 | grad norm: 136138.664 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2642/ 159576 | consumed samples: 42272 | elapsed time per iteration (ms): 13705.7 | learning rate: 1.172E-05 | global batch size: 16 | lm loss: 6.698567E+00 | loss scale: 32768.0 | grad norm: 132708.794 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2643/ 159576 | consumed samples: 42288 | elapsed time per iteration (ms): 13640.4 | learning rate: 1.172E-05 | global batch size: 16 | lm loss: 6.910300E+00 | loss scale: 32768.0 | grad norm: 128937.691 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2644/ 159576 | consumed samples: 42304 | elapsed time per iteration (ms): 13924.6 | learning rate: 1.172E-05 | global batch size: 16 | lm loss: 6.661136E+00 | loss scale: 32768.0 | grad norm: 144385.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2645/ 159576 | consumed samples: 42320 | elapsed time per iteration (ms): 13731.5 | learning rate: 1.173E-05 | global batch size: 16 | lm loss: 6.749330E+00 | loss scale: 32768.0 | grad norm: 136497.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2646/ 159576 | consumed samples: 42336 | elapsed time per iteration (ms): 13631.6 | learning rate: 1.173E-05 | global batch size: 16 | lm loss: 6.774727E+00 | loss scale: 32768.0 | grad norm: 157115.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2647/ 159576 | consumed samples: 42352 | elapsed time per iteration (ms): 13587.3 | learning rate: 1.174E-05 | global batch size: 16 | lm loss: 6.897247E+00 | loss scale: 32768.0 | grad norm: 122884.703 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2648/ 159576 | consumed samples: 42368 | elapsed time per iteration (ms): 13582.9 | learning rate: 1.174E-05 | global batch size: 16 | lm loss: 6.902627E+00 | loss scale: 32768.0 | grad norm: 136617.675 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2649/ 159576 | consumed samples: 42384 | elapsed time per iteration (ms): 14194.1 | learning rate: 1.175E-05 | global batch size: 16 | lm loss: 6.654990E+00 | loss scale: 32768.0 | grad norm: 121668.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2650/ 159576 | consumed samples: 42400 | elapsed time per iteration (ms): 13827.0 | learning rate: 1.175E-05 | global batch size: 16 | lm loss: 6.718140E+00 | loss scale: 32768.0 | grad norm: 94592.966 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2651/ 159576 | consumed samples: 42416 | elapsed time per iteration (ms): 13600.7 | learning rate: 1.176E-05 | global batch size: 16 | lm loss: 6.674122E+00 | loss scale: 32768.0 | grad norm: 105220.566 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2652/ 159576 | consumed samples: 42432 | elapsed time per iteration (ms): 13643.1 | learning rate: 1.176E-05 | global batch size: 16 | lm loss: 6.662145E+00 | loss scale: 32768.0 | grad norm: 222158.908 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2653/ 159576 | consumed samples: 42448 | elapsed time per iteration (ms): 13957.5 | learning rate: 1.176E-05 | global batch size: 16 | lm loss: 6.613699E+00 | loss scale: 32768.0 | grad norm: 110830.033 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2654/ 159576 | consumed samples: 42464 | elapsed time per iteration (ms): 13668.1 | learning rate: 1.177E-05 | global batch size: 16 | lm loss: 6.510882E+00 | loss scale: 32768.0 | grad norm: 143615.139 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2655/ 159576 | consumed samples: 42480 | elapsed time per iteration (ms): 13633.2 | learning rate: 1.177E-05 | global batch size: 16 | lm loss: 6.732093E+00 | loss scale: 32768.0 | grad norm: 159462.660 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2656/ 159576 | consumed samples: 42496 | elapsed time per iteration (ms): 13620.1 | learning rate: 1.178E-05 | global batch size: 16 | lm loss: 6.660037E+00 | loss scale: 32768.0 | grad norm: 244166.739 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2657/ 159576 | consumed samples: 42512 | elapsed time per iteration (ms): 13831.3 | learning rate: 1.178E-05 | global batch size: 16 | lm loss: 6.626472E+00 | loss scale: 32768.0 | grad norm: 149275.048 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2658/ 159576 | consumed samples: 42528 | elapsed time per iteration (ms): 13824.8 | learning rate: 1.179E-05 | global batch size: 16 | lm loss: 6.687421E+00 | loss scale: 32768.0 | grad norm: 139977.063 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2659/ 159576 | consumed samples: 42544 | elapsed time per iteration (ms): 13722.5 | learning rate: 1.179E-05 | global batch size: 16 | lm loss: 6.524724E+00 | loss scale: 32768.0 | grad norm: 106042.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2660/ 159576 | consumed samples: 42560 | elapsed time per iteration (ms): 13670.7 | learning rate: 1.180E-05 | global batch size: 16 | lm loss: 6.908322E+00 | loss scale: 32768.0 | grad norm: 201686.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2661/ 159576 | consumed samples: 42576 | elapsed time per iteration (ms): 13612.7 | learning rate: 1.180E-05 | global batch size: 16 | lm loss: 6.837928E+00 | loss scale: 32768.0 | grad norm: 126017.738 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2662/ 159576 | consumed samples: 42592 | elapsed time per iteration (ms): 13941.2 | learning rate: 1.180E-05 | global batch size: 16 | lm loss: 6.439098E+00 | loss scale: 32768.0 | grad norm: 160984.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2663/ 159576 | consumed samples: 42608 | elapsed time per iteration (ms): 13713.4 | learning rate: 1.181E-05 | global batch size: 16 | lm loss: 6.723923E+00 | loss scale: 32768.0 | grad norm: 139598.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2664/ 159576 | consumed samples: 42624 | elapsed time per iteration (ms): 6797.7 | learning rate: 1.181E-05 | global batch size: 16 | lm loss: 7.335284E+00 | loss scale: 32768.0 | grad norm: 139598.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2665/ 159576 | consumed samples: 42640 | elapsed time per iteration (ms): 13135.0 | learning rate: 1.181E-05 | global batch size: 16 | lm loss: 6.985713E+00 | loss scale: 32768.0 | grad norm: 180390.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2666/ 159576 | consumed samples: 42656 | elapsed time per iteration (ms): 13618.0 | learning rate: 1.182E-05 | global batch size: 16 | lm loss: 6.556298E+00 | loss scale: 32768.0 | grad norm: 144470.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2667/ 159576 | consumed samples: 42672 | elapsed time per iteration (ms): 14126.5 | learning rate: 1.182E-05 | global batch size: 16 | lm loss: 7.063251E+00 | loss scale: 32768.0 | grad norm: 146115.736 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2668/ 159576 | consumed samples: 42688 | elapsed time per iteration (ms): 13677.8 | learning rate: 1.183E-05 | global batch size: 16 | lm loss: 6.846446E+00 | loss scale: 32768.0 | grad norm: 164938.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2669/ 159576 | consumed samples: 42704 | elapsed time per iteration (ms): 13662.5 | learning rate: 1.183E-05 | global batch size: 16 | lm loss: 6.704443E+00 | loss scale: 32768.0 | grad norm: 183338.838 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2670/ 159576 | consumed samples: 42720 | elapsed time per iteration (ms): 13752.8 | learning rate: 1.184E-05 | global batch size: 16 | lm loss: 6.828314E+00 | loss scale: 32768.0 | grad norm: 291659.916 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2671/ 159576 | consumed samples: 42736 | elapsed time per iteration (ms): 14053.5 | learning rate: 1.184E-05 | global batch size: 16 | lm loss: 6.701608E+00 | loss scale: 32768.0 | grad norm: 137566.756 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2672/ 159576 | consumed samples: 42752 | elapsed time per iteration (ms): 13555.7 | learning rate: 1.184E-05 | global batch size: 16 | lm loss: 6.495778E+00 | loss scale: 32768.0 | grad norm: 140566.748 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2673/ 159576 | consumed samples: 42768 | elapsed time per iteration (ms): 13625.0 | learning rate: 1.185E-05 | global batch size: 16 | lm loss: 6.868438E+00 | loss scale: 32768.0 | grad norm: 137822.671 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2674/ 159576 | consumed samples: 42784 | elapsed time per iteration (ms): 13681.3 | learning rate: 1.185E-05 | global batch size: 16 | lm loss: 6.855990E+00 | loss scale: 32768.0 | grad norm: 217925.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2675/ 159576 | consumed samples: 42800 | elapsed time per iteration (ms): 13726.3 | learning rate: 1.186E-05 | global batch size: 16 | lm loss: 6.726338E+00 | loss scale: 32768.0 | grad norm: 169676.723 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2676/ 159576 | consumed samples: 42816 | elapsed time per iteration (ms): 14028.2 | learning rate: 1.186E-05 | global batch size: 16 | lm loss: 6.632861E+00 | loss scale: 32768.0 | grad norm: 146027.824 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2677/ 159576 | consumed samples: 42832 | elapsed time per iteration (ms): 13624.3 | learning rate: 1.187E-05 | global batch size: 16 | lm loss: 6.642831E+00 | loss scale: 32768.0 | grad norm: 163148.856 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2678/ 159576 | consumed samples: 42848 | elapsed time per iteration (ms): 13717.5 | learning rate: 1.187E-05 | global batch size: 16 | lm loss: 6.689285E+00 | loss scale: 32768.0 | grad norm: 129142.991 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2679/ 159576 | consumed samples: 42864 | elapsed time per iteration (ms): 13575.7 | learning rate: 1.188E-05 | global batch size: 16 | lm loss: 6.577474E+00 | loss scale: 32768.0 | grad norm: 168075.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2680/ 159576 | consumed samples: 42880 | elapsed time per iteration (ms): 13990.7 | learning rate: 1.188E-05 | global batch size: 16 | lm loss: 6.806996E+00 | loss scale: 32768.0 | grad norm: 138707.563 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2681/ 159576 | consumed samples: 42896 | elapsed time per iteration (ms): 13614.3 | learning rate: 1.188E-05 | global batch size: 16 | lm loss: 6.616170E+00 | loss scale: 32768.0 | grad norm: 138396.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2682/ 159576 | consumed samples: 42912 | elapsed time per iteration (ms): 13528.4 | learning rate: 1.189E-05 | global batch size: 16 | lm loss: 6.760321E+00 | loss scale: 32768.0 | grad norm: 146622.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2683/ 159576 | consumed samples: 42928 | elapsed time per iteration (ms): 13595.4 | learning rate: 1.189E-05 | global batch size: 16 | lm loss: 6.828167E+00 | loss scale: 32768.0 | grad norm: 205452.941 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2684/ 159576 | consumed samples: 42944 | elapsed time per iteration (ms): 14090.0 | learning rate: 1.190E-05 | global batch size: 16 | lm loss: 6.974781E+00 | loss scale: 32768.0 | grad norm: 141438.762 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2685/ 159576 | consumed samples: 42960 | elapsed time per iteration (ms): 13490.5 | learning rate: 1.190E-05 | global batch size: 16 | lm loss: 6.720265E+00 | loss scale: 32768.0 | grad norm: 131667.640 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2686/ 159576 | consumed samples: 42976 | elapsed time per iteration (ms): 13606.4 | learning rate: 1.191E-05 | global batch size: 16 | lm loss: 6.645846E+00 | loss scale: 32768.0 | grad norm: 143915.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2687/ 159576 | consumed samples: 42992 | elapsed time per iteration (ms): 13579.9 | learning rate: 1.191E-05 | global batch size: 16 | lm loss: 6.852206E+00 | loss scale: 32768.0 | grad norm: 206032.603 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2688/ 159576 | consumed samples: 43008 | elapsed time per iteration (ms): 13654.7 | learning rate: 1.192E-05 | global batch size: 16 | lm loss: 6.708066E+00 | loss scale: 32768.0 | grad norm: 135547.494 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2689/ 159576 | consumed samples: 43024 | elapsed time per iteration (ms): 13756.9 | learning rate: 1.192E-05 | global batch size: 16 | lm loss: 6.627333E+00 | loss scale: 32768.0 | grad norm: 103806.748 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2690/ 159576 | consumed samples: 43040 | elapsed time per iteration (ms): 13560.8 | learning rate: 1.192E-05 | global batch size: 16 | lm loss: 6.624159E+00 | loss scale: 32768.0 | grad norm: 204724.023 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2691/ 159576 | consumed samples: 43056 | elapsed time per iteration (ms): 13656.6 | learning rate: 1.193E-05 | global batch size: 16 | lm loss: 6.803893E+00 | loss scale: 32768.0 | grad norm: 123248.563 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2692/ 159576 | consumed samples: 43072 | elapsed time per iteration (ms): 13672.9 | learning rate: 1.193E-05 | global batch size: 16 | lm loss: 6.801785E+00 | loss scale: 32768.0 | grad norm: 140785.815 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2693/ 159576 | consumed samples: 43088 | elapsed time per iteration (ms): 14015.4 | learning rate: 1.194E-05 | global batch size: 16 | lm loss: 6.464381E+00 | loss scale: 32768.0 | grad norm: 131615.707 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2694/ 159576 | consumed samples: 43104 | elapsed time per iteration (ms): 13588.1 | learning rate: 1.194E-05 | global batch size: 16 | lm loss: 6.727094E+00 | loss scale: 32768.0 | grad norm: 213544.967 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2695/ 159576 | consumed samples: 43120 | elapsed time per iteration (ms): 13608.1 | learning rate: 1.195E-05 | global batch size: 16 | lm loss: 6.930735E+00 | loss scale: 32768.0 | grad norm: 179180.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2696/ 159576 | consumed samples: 43136 | elapsed time per iteration (ms): 13594.8 | learning rate: 1.195E-05 | global batch size: 16 | lm loss: 6.652137E+00 | loss scale: 32768.0 | grad norm: 171091.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2697/ 159576 | consumed samples: 43152 | elapsed time per iteration (ms): 13943.3 | learning rate: 1.196E-05 | global batch size: 16 | lm loss: 6.731685E+00 | loss scale: 32768.0 | grad norm: 151811.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2698/ 159576 | consumed samples: 43168 | elapsed time per iteration (ms): 13773.1 | learning rate: 1.196E-05 | global batch size: 16 | lm loss: 7.081783E+00 | loss scale: 32768.0 | grad norm: 132367.994 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2699/ 159576 | consumed samples: 43184 | elapsed time per iteration (ms): 13644.6 | learning rate: 1.196E-05 | global batch size: 16 | lm loss: 6.806893E+00 | loss scale: 32768.0 | grad norm: 319459.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2700/ 159576 | consumed samples: 43200 | elapsed time per iteration (ms): 13698.5 | learning rate: 1.197E-05 | global batch size: 16 | lm loss: 6.666497E+00 | loss scale: 32768.0 | grad norm: 120927.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2701/ 159576 | consumed samples: 43216 | elapsed time per iteration (ms): 13684.8 | learning rate: 1.197E-05 | global batch size: 16 | lm loss: 6.701412E+00 | loss scale: 32768.0 | grad norm: 150633.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2702/ 159576 | consumed samples: 43232 | elapsed time per iteration (ms): 13780.3 | learning rate: 1.198E-05 | global batch size: 16 | lm loss: 6.594296E+00 | loss scale: 32768.0 | grad norm: 161110.656 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2703/ 159576 | consumed samples: 43248 | elapsed time per iteration (ms): 13593.9 | learning rate: 1.198E-05 | global batch size: 16 | lm loss: 6.808178E+00 | loss scale: 32768.0 | grad norm: 258358.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2704/ 159576 | consumed samples: 43264 | elapsed time per iteration (ms): 13635.4 | learning rate: 1.199E-05 | global batch size: 16 | lm loss: 6.815506E+00 | loss scale: 32768.0 | grad norm: 183028.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2705/ 159576 | consumed samples: 43280 | elapsed time per iteration (ms): 13605.1 | learning rate: 1.199E-05 | global batch size: 16 | lm loss: 6.967249E+00 | loss scale: 32768.0 | grad norm: 243583.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2706/ 159576 | consumed samples: 43296 | elapsed time per iteration (ms): 14130.1 | learning rate: 1.200E-05 | global batch size: 16 | lm loss: 7.062543E+00 | loss scale: 32768.0 | grad norm: 207737.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2707/ 159576 | consumed samples: 43312 | elapsed time per iteration (ms): 13561.8 | learning rate: 1.200E-05 | global batch size: 16 | lm loss: 6.758321E+00 | loss scale: 32768.0 | grad norm: 146527.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2708/ 159576 | consumed samples: 43328 | elapsed time per iteration (ms): 13722.0 | learning rate: 1.200E-05 | global batch size: 16 | lm loss: 6.584868E+00 | loss scale: 32768.0 | grad norm: 272015.780 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2709/ 159576 | consumed samples: 43344 | elapsed time per iteration (ms): 13654.1 | learning rate: 1.201E-05 | global batch size: 16 | lm loss: 6.709559E+00 | loss scale: 32768.0 | grad norm: 284012.046 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2710/ 159576 | consumed samples: 43360 | elapsed time per iteration (ms): 13595.7 | learning rate: 1.201E-05 | global batch size: 16 | lm loss: 6.830414E+00 | loss scale: 32768.0 | grad norm: 149403.503 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2711/ 159576 | consumed samples: 43376 | elapsed time per iteration (ms): 13973.4 | learning rate: 1.202E-05 | global batch size: 16 | lm loss: 6.624958E+00 | loss scale: 32768.0 | grad norm: 146777.014 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2712/ 159576 | consumed samples: 43392 | elapsed time per iteration (ms): 13700.0 | learning rate: 1.202E-05 | global batch size: 16 | lm loss: 6.735670E+00 | loss scale: 32768.0 | grad norm: 136631.989 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2713/ 159576 | consumed samples: 43408 | elapsed time per iteration (ms): 13572.3 | learning rate: 1.203E-05 | global batch size: 16 | lm loss: 6.765169E+00 | loss scale: 32768.0 | grad norm: 280479.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2714/ 159576 | consumed samples: 43424 | elapsed time per iteration (ms): 13642.4 | learning rate: 1.203E-05 | global batch size: 16 | lm loss: 6.622662E+00 | loss scale: 32768.0 | grad norm: 160875.579 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2715/ 159576 | consumed samples: 43440 | elapsed time per iteration (ms): 14122.3 | learning rate: 1.204E-05 | global batch size: 16 | lm loss: 6.730956E+00 | loss scale: 32768.0 | grad norm: 206409.146 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2716/ 159576 | consumed samples: 43456 | elapsed time per iteration (ms): 13831.1 | learning rate: 1.204E-05 | global batch size: 16 | lm loss: 6.767645E+00 | loss scale: 32768.0 | grad norm: 149352.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2717/ 159576 | consumed samples: 43472 | elapsed time per iteration (ms): 13572.9 | learning rate: 1.204E-05 | global batch size: 16 | lm loss: 6.975914E+00 | loss scale: 32768.0 | grad norm: 119850.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2718/ 159576 | consumed samples: 43488 | elapsed time per iteration (ms): 13686.9 | learning rate: 1.205E-05 | global batch size: 16 | lm loss: 6.919794E+00 | loss scale: 32768.0 | grad norm: 172348.990 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2719/ 159576 | consumed samples: 43504 | elapsed time per iteration (ms): 13976.8 | learning rate: 1.205E-05 | global batch size: 16 | lm loss: 6.652202E+00 | loss scale: 32768.0 | grad norm: 178184.791 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2720/ 159576 | consumed samples: 43520 | elapsed time per iteration (ms): 13571.8 | learning rate: 1.206E-05 | global batch size: 16 | lm loss: 6.787558E+00 | loss scale: 32768.0 | grad norm: 130225.615 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2721/ 159576 | consumed samples: 43536 | elapsed time per iteration (ms): 13693.7 | learning rate: 1.206E-05 | global batch size: 16 | lm loss: 6.660249E+00 | loss scale: 32768.0 | grad norm: 144428.996 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2722/ 159576 | consumed samples: 43552 | elapsed time per iteration (ms): 13646.9 | learning rate: 1.207E-05 | global batch size: 16 | lm loss: 6.661267E+00 | loss scale: 32768.0 | grad norm: 121995.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2723/ 159576 | consumed samples: 43568 | elapsed time per iteration (ms): 13718.1 | learning rate: 1.207E-05 | global batch size: 16 | lm loss: 6.702977E+00 | loss scale: 32768.0 | grad norm: 205375.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2724/ 159576 | consumed samples: 43584 | elapsed time per iteration (ms): 14072.2 | learning rate: 1.208E-05 | global batch size: 16 | lm loss: 6.859900E+00 | loss scale: 32768.0 | grad norm: 174185.553 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2725/ 159576 | consumed samples: 43600 | elapsed time per iteration (ms): 13643.1 | learning rate: 1.208E-05 | global batch size: 16 | lm loss: 6.642687E+00 | loss scale: 32768.0 | grad norm: 124356.151 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2726/ 159576 | consumed samples: 43616 | elapsed time per iteration (ms): 13637.6 | learning rate: 1.208E-05 | global batch size: 16 | lm loss: 6.849540E+00 | loss scale: 32768.0 | grad norm: 187912.708 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2727/ 159576 | consumed samples: 43632 | elapsed time per iteration (ms): 13570.5 | learning rate: 1.209E-05 | global batch size: 16 | lm loss: 6.505477E+00 | loss scale: 32768.0 | grad norm: 146429.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2728/ 159576 | consumed samples: 43648 | elapsed time per iteration (ms): 14179.1 | learning rate: 1.209E-05 | global batch size: 16 | lm loss: 6.763928E+00 | loss scale: 32768.0 | grad norm: 143016.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2729/ 159576 | consumed samples: 43664 | elapsed time per iteration (ms): 13666.5 | learning rate: 1.210E-05 | global batch size: 16 | lm loss: 6.746594E+00 | loss scale: 32768.0 | grad norm: 184649.070 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2730/ 159576 | consumed samples: 43680 | elapsed time per iteration (ms): 13666.9 | learning rate: 1.210E-05 | global batch size: 16 | lm loss: 6.822509E+00 | loss scale: 32768.0 | grad norm: 258599.749 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2731/ 159576 | consumed samples: 43696 | elapsed time per iteration (ms): 13722.5 | learning rate: 1.211E-05 | global batch size: 16 | lm loss: 6.726813E+00 | loss scale: 32768.0 | grad norm: 135253.086 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2732/ 159576 | consumed samples: 43712 | elapsed time per iteration (ms): 14110.6 | learning rate: 1.211E-05 | global batch size: 16 | lm loss: 6.642574E+00 | loss scale: 32768.0 | grad norm: 187051.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2733/ 159576 | consumed samples: 43728 | elapsed time per iteration (ms): 13665.7 | learning rate: 1.212E-05 | global batch size: 16 | lm loss: 6.608624E+00 | loss scale: 32768.0 | grad norm: 164163.009 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2734/ 159576 | consumed samples: 43744 | elapsed time per iteration (ms): 13624.6 | learning rate: 1.212E-05 | global batch size: 16 | lm loss: 6.755674E+00 | loss scale: 32768.0 | grad norm: 129230.586 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2735/ 159576 | consumed samples: 43760 | elapsed time per iteration (ms): 13617.1 | learning rate: 1.212E-05 | global batch size: 16 | lm loss: 6.771841E+00 | loss scale: 32768.0 | grad norm: 254766.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2736/ 159576 | consumed samples: 43776 | elapsed time per iteration (ms): 13675.3 | learning rate: 1.213E-05 | global batch size: 16 | lm loss: 6.677852E+00 | loss scale: 32768.0 | grad norm: 142644.144 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2737/ 159576 | consumed samples: 43792 | elapsed time per iteration (ms): 13983.3 | learning rate: 1.213E-05 | global batch size: 16 | lm loss: 6.719501E+00 | loss scale: 32768.0 | grad norm: 164953.828 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2738/ 159576 | consumed samples: 43808 | elapsed time per iteration (ms): 13774.1 | learning rate: 1.214E-05 | global batch size: 16 | lm loss: 6.637510E+00 | loss scale: 32768.0 | grad norm: 161949.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2739/ 159576 | consumed samples: 43824 | elapsed time per iteration (ms): 13780.8 | learning rate: 1.214E-05 | global batch size: 16 | lm loss: 6.670253E+00 | loss scale: 32768.0 | grad norm: 132053.899 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2740/ 159576 | consumed samples: 43840 | elapsed time per iteration (ms): 13656.5 | learning rate: 1.215E-05 | global batch size: 16 | lm loss: 6.701370E+00 | loss scale: 32768.0 | grad norm: 158609.635 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2741/ 159576 | consumed samples: 43856 | elapsed time per iteration (ms): 13970.4 | learning rate: 1.215E-05 | global batch size: 16 | lm loss: 6.676120E+00 | loss scale: 32768.0 | grad norm: 133079.118 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2742/ 159576 | consumed samples: 43872 | elapsed time per iteration (ms): 13572.9 | learning rate: 1.216E-05 | global batch size: 16 | lm loss: 6.666083E+00 | loss scale: 32768.0 | grad norm: 121076.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2743/ 159576 | consumed samples: 43888 | elapsed time per iteration (ms): 13635.9 | learning rate: 1.216E-05 | global batch size: 16 | lm loss: 6.594894E+00 | loss scale: 32768.0 | grad norm: 206897.979 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2744/ 159576 | consumed samples: 43904 | elapsed time per iteration (ms): 13681.8 | learning rate: 1.216E-05 | global batch size: 16 | lm loss: 6.700480E+00 | loss scale: 32768.0 | grad norm: 126037.905 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2745/ 159576 | consumed samples: 43920 | elapsed time per iteration (ms): 13966.9 | learning rate: 1.217E-05 | global batch size: 16 | lm loss: 6.708483E+00 | loss scale: 32768.0 | grad norm: 136172.741 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2746/ 159576 | consumed samples: 43936 | elapsed time per iteration (ms): 13758.4 | learning rate: 1.217E-05 | global batch size: 16 | lm loss: 6.629419E+00 | loss scale: 32768.0 | grad norm: 142570.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2747/ 159576 | consumed samples: 43952 | elapsed time per iteration (ms): 13668.5 | learning rate: 1.218E-05 | global batch size: 16 | lm loss: 6.597517E+00 | loss scale: 32768.0 | grad norm: 155237.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2748/ 159576 | consumed samples: 43968 | elapsed time per iteration (ms): 13633.2 | learning rate: 1.218E-05 | global batch size: 16 | lm loss: 6.561327E+00 | loss scale: 32768.0 | grad norm: 162642.892 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2749/ 159576 | consumed samples: 43984 | elapsed time per iteration (ms): 13608.4 | learning rate: 1.219E-05 | global batch size: 16 | lm loss: 6.677460E+00 | loss scale: 32768.0 | grad norm: 192650.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2750/ 159576 | consumed samples: 44000 | elapsed time per iteration (ms): 13886.7 | learning rate: 1.219E-05 | global batch size: 16 | lm loss: 6.649335E+00 | loss scale: 32768.0 | grad norm: 171673.975 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2751/ 159576 | consumed samples: 44016 | elapsed time per iteration (ms): 13671.6 | learning rate: 1.220E-05 | global batch size: 16 | lm loss: 6.735415E+00 | loss scale: 32768.0 | grad norm: 128822.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2752/ 159576 | consumed samples: 44032 | elapsed time per iteration (ms): 13708.1 | learning rate: 1.220E-05 | global batch size: 16 | lm loss: 6.679979E+00 | loss scale: 32768.0 | grad norm: 253310.737 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2753/ 159576 | consumed samples: 44048 | elapsed time per iteration (ms): 13770.7 | learning rate: 1.220E-05 | global batch size: 16 | lm loss: 6.565764E+00 | loss scale: 32768.0 | grad norm: 116179.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2754/ 159576 | consumed samples: 44064 | elapsed time per iteration (ms): 14066.6 | learning rate: 1.221E-05 | global batch size: 16 | lm loss: 6.742185E+00 | loss scale: 32768.0 | grad norm: 141403.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2755/ 159576 | consumed samples: 44080 | elapsed time per iteration (ms): 13651.8 | learning rate: 1.221E-05 | global batch size: 16 | lm loss: 6.762599E+00 | loss scale: 32768.0 | grad norm: 111172.995 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2756/ 159576 | consumed samples: 44096 | elapsed time per iteration (ms): 13694.5 | learning rate: 1.222E-05 | global batch size: 16 | lm loss: 6.733878E+00 | loss scale: 32768.0 | grad norm: 128168.972 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2757/ 159576 | consumed samples: 44112 | elapsed time per iteration (ms): 13604.8 | learning rate: 1.222E-05 | global batch size: 16 | lm loss: 6.588708E+00 | loss scale: 32768.0 | grad norm: 103022.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2758/ 159576 | consumed samples: 44128 | elapsed time per iteration (ms): 13653.9 | learning rate: 1.223E-05 | global batch size: 16 | lm loss: 6.562719E+00 | loss scale: 32768.0 | grad norm: 138192.892 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2759/ 159576 | consumed samples: 44144 | elapsed time per iteration (ms): 13986.1 | learning rate: 1.223E-05 | global batch size: 16 | lm loss: 6.738625E+00 | loss scale: 32768.0 | grad norm: 121839.165 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2760/ 159576 | consumed samples: 44160 | elapsed time per iteration (ms): 13725.3 | learning rate: 1.224E-05 | global batch size: 16 | lm loss: 6.566117E+00 | loss scale: 32768.0 | grad norm: 104901.052 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2761/ 159576 | consumed samples: 44176 | elapsed time per iteration (ms): 13770.1 | learning rate: 1.224E-05 | global batch size: 16 | lm loss: 6.666871E+00 | loss scale: 32768.0 | grad norm: 123398.519 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2762/ 159576 | consumed samples: 44192 | elapsed time per iteration (ms): 13627.5 | learning rate: 1.224E-05 | global batch size: 16 | lm loss: 6.835371E+00 | loss scale: 32768.0 | grad norm: 112214.547 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2763/ 159576 | consumed samples: 44208 | elapsed time per iteration (ms): 14068.3 | learning rate: 1.225E-05 | global batch size: 16 | lm loss: 6.804303E+00 | loss scale: 32768.0 | grad norm: 122506.789 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2764/ 159576 | consumed samples: 44224 | elapsed time per iteration (ms): 6917.6 | learning rate: 1.225E-05 | global batch size: 16 | lm loss: 6.972560E+00 | loss scale: 16384.0 | grad norm: 122506.789 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2765/ 159576 | consumed samples: 44240 | elapsed time per iteration (ms): 13181.9 | learning rate: 1.225E-05 | global batch size: 16 | lm loss: 6.580292E+00 | loss scale: 16384.0 | grad norm: 59992.079 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2766/ 159576 | consumed samples: 44256 | elapsed time per iteration (ms): 13680.1 | learning rate: 1.226E-05 | global batch size: 16 | lm loss: 6.724333E+00 | loss scale: 16384.0 | grad norm: 77015.113 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2767/ 159576 | consumed samples: 44272 | elapsed time per iteration (ms): 13716.6 | learning rate: 1.226E-05 | global batch size: 16 | lm loss: 6.933354E+00 | loss scale: 16384.0 | grad norm: 85522.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2768/ 159576 | consumed samples: 44288 | elapsed time per iteration (ms): 13994.0 | learning rate: 1.227E-05 | global batch size: 16 | lm loss: 6.648163E+00 | loss scale: 16384.0 | grad norm: 58295.975 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2769/ 159576 | consumed samples: 44304 | elapsed time per iteration (ms): 13658.9 | learning rate: 1.227E-05 | global batch size: 16 | lm loss: 6.891530E+00 | loss scale: 16384.0 | grad norm: 75446.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2770/ 159576 | consumed samples: 44320 | elapsed time per iteration (ms): 13703.7 | learning rate: 1.228E-05 | global batch size: 16 | lm loss: 6.591332E+00 | loss scale: 16384.0 | grad norm: 59290.056 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2771/ 159576 | consumed samples: 44336 | elapsed time per iteration (ms): 13716.9 | learning rate: 1.228E-05 | global batch size: 16 | lm loss: 6.737020E+00 | loss scale: 16384.0 | grad norm: 51929.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2772/ 159576 | consumed samples: 44352 | elapsed time per iteration (ms): 14010.7 | learning rate: 1.228E-05 | global batch size: 16 | lm loss: 6.565439E+00 | loss scale: 16384.0 | grad norm: 100304.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2773/ 159576 | consumed samples: 44368 | elapsed time per iteration (ms): 13566.2 | learning rate: 1.229E-05 | global batch size: 16 | lm loss: 6.887408E+00 | loss scale: 16384.0 | grad norm: 86699.024 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2774/ 159576 | consumed samples: 44384 | elapsed time per iteration (ms): 13639.1 | learning rate: 1.229E-05 | global batch size: 16 | lm loss: 6.766156E+00 | loss scale: 16384.0 | grad norm: 64840.948 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2775/ 159576 | consumed samples: 44400 | elapsed time per iteration (ms): 13646.1 | learning rate: 1.230E-05 | global batch size: 16 | lm loss: 6.640082E+00 | loss scale: 16384.0 | grad norm: 61943.696 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2776/ 159576 | consumed samples: 44416 | elapsed time per iteration (ms): 13670.4 | learning rate: 1.230E-05 | global batch size: 16 | lm loss: 6.784959E+00 | loss scale: 16384.0 | grad norm: 68978.844 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2777/ 159576 | consumed samples: 44432 | elapsed time per iteration (ms): 14012.8 | learning rate: 1.231E-05 | global batch size: 16 | lm loss: 6.670368E+00 | loss scale: 16384.0 | grad norm: 58668.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2778/ 159576 | consumed samples: 44448 | elapsed time per iteration (ms): 13651.5 | learning rate: 1.231E-05 | global batch size: 16 | lm loss: 6.849538E+00 | loss scale: 16384.0 | grad norm: 53539.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2779/ 159576 | consumed samples: 44464 | elapsed time per iteration (ms): 13531.1 | learning rate: 1.232E-05 | global batch size: 16 | lm loss: 6.710807E+00 | loss scale: 16384.0 | grad norm: 58047.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2780/ 159576 | consumed samples: 44480 | elapsed time per iteration (ms): 13601.2 | learning rate: 1.232E-05 | global batch size: 16 | lm loss: 6.803576E+00 | loss scale: 16384.0 | grad norm: 61014.969 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2781/ 159576 | consumed samples: 44496 | elapsed time per iteration (ms): 14011.6 | learning rate: 1.232E-05 | global batch size: 16 | lm loss: 6.435648E+00 | loss scale: 16384.0 | grad norm: 72928.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2782/ 159576 | consumed samples: 44512 | elapsed time per iteration (ms): 13706.9 | learning rate: 1.233E-05 | global batch size: 16 | lm loss: 6.689322E+00 | loss scale: 16384.0 | grad norm: 45124.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2783/ 159576 | consumed samples: 44528 | elapsed time per iteration (ms): 13638.0 | learning rate: 1.233E-05 | global batch size: 16 | lm loss: 6.796506E+00 | loss scale: 16384.0 | grad norm: 61254.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2784/ 159576 | consumed samples: 44544 | elapsed time per iteration (ms): 13617.3 | learning rate: 1.234E-05 | global batch size: 16 | lm loss: 6.726316E+00 | loss scale: 16384.0 | grad norm: 58102.179 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2785/ 159576 | consumed samples: 44560 | elapsed time per iteration (ms): 13946.8 | learning rate: 1.234E-05 | global batch size: 16 | lm loss: 6.648038E+00 | loss scale: 16384.0 | grad norm: 68282.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2786/ 159576 | consumed samples: 44576 | elapsed time per iteration (ms): 13594.9 | learning rate: 1.235E-05 | global batch size: 16 | lm loss: 6.860110E+00 | loss scale: 16384.0 | grad norm: 70475.870 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2787/ 159576 | consumed samples: 44592 | elapsed time per iteration (ms): 13607.8 | learning rate: 1.235E-05 | global batch size: 16 | lm loss: 6.821939E+00 | loss scale: 16384.0 | grad norm: 56499.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2788/ 159576 | consumed samples: 44608 | elapsed time per iteration (ms): 13592.1 | learning rate: 1.236E-05 | global batch size: 16 | lm loss: 6.702363E+00 | loss scale: 16384.0 | grad norm: 71878.494 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2789/ 159576 | consumed samples: 44624 | elapsed time per iteration (ms): 13633.0 | learning rate: 1.236E-05 | global batch size: 16 | lm loss: 6.596258E+00 | loss scale: 16384.0 | grad norm: 57167.131 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2790/ 159576 | consumed samples: 44640 | elapsed time per iteration (ms): 13806.2 | learning rate: 1.236E-05 | global batch size: 16 | lm loss: 6.742100E+00 | loss scale: 16384.0 | grad norm: 78591.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2791/ 159576 | consumed samples: 44656 | elapsed time per iteration (ms): 13659.4 | learning rate: 1.237E-05 | global batch size: 16 | lm loss: 6.602869E+00 | loss scale: 16384.0 | grad norm: 68726.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2792/ 159576 | consumed samples: 44672 | elapsed time per iteration (ms): 13592.2 | learning rate: 1.237E-05 | global batch size: 16 | lm loss: 6.708993E+00 | loss scale: 16384.0 | grad norm: 98214.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2793/ 159576 | consumed samples: 44688 | elapsed time per iteration (ms): 13507.3 | learning rate: 1.238E-05 | global batch size: 16 | lm loss: 6.616965E+00 | loss scale: 16384.0 | grad norm: 72150.719 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2794/ 159576 | consumed samples: 44704 | elapsed time per iteration (ms): 13955.1 | learning rate: 1.238E-05 | global batch size: 16 | lm loss: 6.607640E+00 | loss scale: 16384.0 | grad norm: 62728.696 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2795/ 159576 | consumed samples: 44720 | elapsed time per iteration (ms): 13531.1 | learning rate: 1.239E-05 | global batch size: 16 | lm loss: 6.875388E+00 | loss scale: 16384.0 | grad norm: 94768.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2796/ 159576 | consumed samples: 44736 | elapsed time per iteration (ms): 13614.2 | learning rate: 1.239E-05 | global batch size: 16 | lm loss: 6.827682E+00 | loss scale: 16384.0 | grad norm: 59818.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2797/ 159576 | consumed samples: 44752 | elapsed time per iteration (ms): 13620.6 | learning rate: 1.239E-05 | global batch size: 16 | lm loss: 6.522869E+00 | loss scale: 16384.0 | grad norm: 74009.172 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2798/ 159576 | consumed samples: 44768 | elapsed time per iteration (ms): 13985.4 | learning rate: 1.240E-05 | global batch size: 16 | lm loss: 6.654684E+00 | loss scale: 16384.0 | grad norm: 54913.035 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2799/ 159576 | consumed samples: 44784 | elapsed time per iteration (ms): 13759.4 | learning rate: 1.240E-05 | global batch size: 16 | lm loss: 6.544140E+00 | loss scale: 16384.0 | grad norm: 83654.114 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2800/ 159576 | consumed samples: 44800 | elapsed time per iteration (ms): 13524.0 | learning rate: 1.241E-05 | global batch size: 16 | lm loss: 6.798269E+00 | loss scale: 16384.0 | grad norm: 80678.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2801/ 159576 | consumed samples: 44816 | elapsed time per iteration (ms): 13646.5 | learning rate: 1.241E-05 | global batch size: 16 | lm loss: 6.872281E+00 | loss scale: 16384.0 | grad norm: 49084.933 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2802/ 159576 | consumed samples: 44832 | elapsed time per iteration (ms): 13614.0 | learning rate: 1.242E-05 | global batch size: 16 | lm loss: 6.733764E+00 | loss scale: 16384.0 | grad norm: 88585.751 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2803/ 159576 | consumed samples: 44848 | elapsed time per iteration (ms): 13792.4 | learning rate: 1.242E-05 | global batch size: 16 | lm loss: 6.865559E+00 | loss scale: 16384.0 | grad norm: 48186.949 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2804/ 159576 | consumed samples: 44864 | elapsed time per iteration (ms): 13655.0 | learning rate: 1.243E-05 | global batch size: 16 | lm loss: 6.631515E+00 | loss scale: 16384.0 | grad norm: 66281.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2805/ 159576 | consumed samples: 44880 | elapsed time per iteration (ms): 13605.4 | learning rate: 1.243E-05 | global batch size: 16 | lm loss: 6.593436E+00 | loss scale: 16384.0 | grad norm: 66274.800 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2806/ 159576 | consumed samples: 44896 | elapsed time per iteration (ms): 13611.6 | learning rate: 1.243E-05 | global batch size: 16 | lm loss: 6.692297E+00 | loss scale: 16384.0 | grad norm: 66535.812 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2807/ 159576 | consumed samples: 44912 | elapsed time per iteration (ms): 13924.4 | learning rate: 1.244E-05 | global batch size: 16 | lm loss: 6.564488E+00 | loss scale: 16384.0 | grad norm: 62289.026 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2808/ 159576 | consumed samples: 44928 | elapsed time per iteration (ms): 13559.5 | learning rate: 1.244E-05 | global batch size: 16 | lm loss: 6.775381E+00 | loss scale: 16384.0 | grad norm: 51114.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2809/ 159576 | consumed samples: 44944 | elapsed time per iteration (ms): 13579.6 | learning rate: 1.245E-05 | global batch size: 16 | lm loss: 6.854599E+00 | loss scale: 16384.0 | grad norm: 78574.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2810/ 159576 | consumed samples: 44960 | elapsed time per iteration (ms): 13568.8 | learning rate: 1.245E-05 | global batch size: 16 | lm loss: 6.641658E+00 | loss scale: 16384.0 | grad norm: 48054.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2811/ 159576 | consumed samples: 44976 | elapsed time per iteration (ms): 13577.2 | learning rate: 1.246E-05 | global batch size: 16 | lm loss: 6.804714E+00 | loss scale: 16384.0 | grad norm: 85293.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2812/ 159576 | consumed samples: 44992 | elapsed time per iteration (ms): 13780.4 | learning rate: 1.246E-05 | global batch size: 16 | lm loss: 6.484572E+00 | loss scale: 16384.0 | grad norm: 54599.094 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2813/ 159576 | consumed samples: 45008 | elapsed time per iteration (ms): 13630.2 | learning rate: 1.247E-05 | global batch size: 16 | lm loss: 6.495656E+00 | loss scale: 16384.0 | grad norm: 131722.081 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2814/ 159576 | consumed samples: 45024 | elapsed time per iteration (ms): 13626.8 | learning rate: 1.247E-05 | global batch size: 16 | lm loss: 6.894939E+00 | loss scale: 16384.0 | grad norm: 102881.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2815/ 159576 | consumed samples: 45040 | elapsed time per iteration (ms): 13599.0 | learning rate: 1.247E-05 | global batch size: 16 | lm loss: 6.883965E+00 | loss scale: 16384.0 | grad norm: 72100.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2816/ 159576 | consumed samples: 45056 | elapsed time per iteration (ms): 14052.1 | learning rate: 1.248E-05 | global batch size: 16 | lm loss: 6.573022E+00 | loss scale: 16384.0 | grad norm: 72968.507 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2817/ 159576 | consumed samples: 45072 | elapsed time per iteration (ms): 13541.1 | learning rate: 1.248E-05 | global batch size: 16 | lm loss: 6.646833E+00 | loss scale: 16384.0 | grad norm: 90510.016 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2818/ 159576 | consumed samples: 45088 | elapsed time per iteration (ms): 13597.7 | learning rate: 1.249E-05 | global batch size: 16 | lm loss: 6.898618E+00 | loss scale: 16384.0 | grad norm: 90037.566 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2819/ 159576 | consumed samples: 45104 | elapsed time per iteration (ms): 13575.0 | learning rate: 1.249E-05 | global batch size: 16 | lm loss: 6.547668E+00 | loss scale: 16384.0 | grad norm: 79277.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2820/ 159576 | consumed samples: 45120 | elapsed time per iteration (ms): 14016.3 | learning rate: 1.250E-05 | global batch size: 16 | lm loss: 6.791230E+00 | loss scale: 16384.0 | grad norm: 63437.139 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2821/ 159576 | consumed samples: 45136 | elapsed time per iteration (ms): 13565.5 | learning rate: 1.250E-05 | global batch size: 16 | lm loss: 6.957808E+00 | loss scale: 16384.0 | grad norm: 56738.743 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2822/ 159576 | consumed samples: 45152 | elapsed time per iteration (ms): 13564.0 | learning rate: 1.251E-05 | global batch size: 16 | lm loss: 6.729958E+00 | loss scale: 16384.0 | grad norm: 93778.013 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2823/ 159576 | consumed samples: 45168 | elapsed time per iteration (ms): 13650.0 | learning rate: 1.251E-05 | global batch size: 16 | lm loss: 6.480144E+00 | loss scale: 16384.0 | grad norm: 60246.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2824/ 159576 | consumed samples: 45184 | elapsed time per iteration (ms): 13511.5 | learning rate: 1.251E-05 | global batch size: 16 | lm loss: 6.595847E+00 | loss scale: 16384.0 | grad norm: 63557.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2825/ 159576 | consumed samples: 45200 | elapsed time per iteration (ms): 13655.5 | learning rate: 1.252E-05 | global batch size: 16 | lm loss: 6.689149E+00 | loss scale: 16384.0 | grad norm: 67372.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2826/ 159576 | consumed samples: 45216 | elapsed time per iteration (ms): 13638.0 | learning rate: 1.252E-05 | global batch size: 16 | lm loss: 6.689507E+00 | loss scale: 16384.0 | grad norm: 69124.069 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2827/ 159576 | consumed samples: 45232 | elapsed time per iteration (ms): 13546.1 | learning rate: 1.253E-05 | global batch size: 16 | lm loss: 6.457958E+00 | loss scale: 16384.0 | grad norm: 56160.018 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2828/ 159576 | consumed samples: 45248 | elapsed time per iteration (ms): 13610.9 | learning rate: 1.253E-05 | global batch size: 16 | lm loss: 6.815155E+00 | loss scale: 16384.0 | grad norm: 61009.082 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2829/ 159576 | consumed samples: 45264 | elapsed time per iteration (ms): 13930.1 | learning rate: 1.254E-05 | global batch size: 16 | lm loss: 6.595886E+00 | loss scale: 16384.0 | grad norm: 45906.579 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2830/ 159576 | consumed samples: 45280 | elapsed time per iteration (ms): 13608.1 | learning rate: 1.254E-05 | global batch size: 16 | lm loss: 6.642846E+00 | loss scale: 16384.0 | grad norm: 74796.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2831/ 159576 | consumed samples: 45296 | elapsed time per iteration (ms): 13539.5 | learning rate: 1.255E-05 | global batch size: 16 | lm loss: 6.810493E+00 | loss scale: 16384.0 | grad norm: 64536.090 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2832/ 159576 | consumed samples: 45312 | elapsed time per iteration (ms): 13571.9 | learning rate: 1.255E-05 | global batch size: 16 | lm loss: 6.742997E+00 | loss scale: 16384.0 | grad norm: 54697.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2833/ 159576 | consumed samples: 45328 | elapsed time per iteration (ms): 13626.3 | learning rate: 1.255E-05 | global batch size: 16 | lm loss: 6.734198E+00 | loss scale: 16384.0 | grad norm: 124253.673 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2834/ 159576 | consumed samples: 45344 | elapsed time per iteration (ms): 13905.8 | learning rate: 1.256E-05 | global batch size: 16 | lm loss: 6.652834E+00 | loss scale: 16384.0 | grad norm: 71659.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 13:06:39] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 13:06:39] PULSE: tr8-104B is running for 7:14:28 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 2835/ 159576 | consumed samples: 45360 | elapsed time per iteration (ms): 13609.5 | learning rate: 1.256E-05 | global batch size: 16 | lm loss: 6.789959E+00 | loss scale: 16384.0 | grad norm: 73488.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2836/ 159576 | consumed samples: 45376 | elapsed time per iteration (ms): 13614.7 | learning rate: 1.257E-05 | global batch size: 16 | lm loss: 6.695529E+00 | loss scale: 16384.0 | grad norm: 69307.839 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2837/ 159576 | consumed samples: 45392 | elapsed time per iteration (ms): 13634.1 | learning rate: 1.257E-05 | global batch size: 16 | lm loss: 6.550642E+00 | loss scale: 16384.0 | grad norm: 88157.717 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2838/ 159576 | consumed samples: 45408 | elapsed time per iteration (ms): 14029.3 | learning rate: 1.258E-05 | global batch size: 16 | lm loss: 6.745864E+00 | loss scale: 16384.0 | grad norm: 79032.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2839/ 159576 | consumed samples: 45424 | elapsed time per iteration (ms): 13631.7 | learning rate: 1.258E-05 | global batch size: 16 | lm loss: 7.013217E+00 | loss scale: 16384.0 | grad norm: 90598.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2840/ 159576 | consumed samples: 45440 | elapsed time per iteration (ms): 13552.2 | learning rate: 1.259E-05 | global batch size: 16 | lm loss: 6.791473E+00 | loss scale: 16384.0 | grad norm: 66761.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2841/ 159576 | consumed samples: 45456 | elapsed time per iteration (ms): 13585.4 | learning rate: 1.259E-05 | global batch size: 16 | lm loss: 6.639102E+00 | loss scale: 16384.0 | grad norm: 75945.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2842/ 159576 | consumed samples: 45472 | elapsed time per iteration (ms): 14005.5 | learning rate: 1.259E-05 | global batch size: 16 | lm loss: 6.750570E+00 | loss scale: 16384.0 | grad norm: 52422.045 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2843/ 159576 | consumed samples: 45488 | elapsed time per iteration (ms): 13637.6 | learning rate: 1.260E-05 | global batch size: 16 | lm loss: 6.761233E+00 | loss scale: 16384.0 | grad norm: 96201.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2844/ 159576 | consumed samples: 45504 | elapsed time per iteration (ms): 13605.0 | learning rate: 1.260E-05 | global batch size: 16 | lm loss: 6.869712E+00 | loss scale: 16384.0 | grad norm: 85259.060 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2845/ 159576 | consumed samples: 45520 | elapsed time per iteration (ms): 13489.6 | learning rate: 1.261E-05 | global batch size: 16 | lm loss: 6.754227E+00 | loss scale: 16384.0 | grad norm: 71430.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2846/ 159576 | consumed samples: 45536 | elapsed time per iteration (ms): 13633.0 | learning rate: 1.261E-05 | global batch size: 16 | lm loss: 6.681328E+00 | loss scale: 16384.0 | grad norm: 64498.648 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2847/ 159576 | consumed samples: 45552 | elapsed time per iteration (ms): 13680.5 | learning rate: 1.262E-05 | global batch size: 16 | lm loss: 6.708944E+00 | loss scale: 16384.0 | grad norm: 99300.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2848/ 159576 | consumed samples: 45568 | elapsed time per iteration (ms): 13578.9 | learning rate: 1.262E-05 | global batch size: 16 | lm loss: 6.689048E+00 | loss scale: 16384.0 | grad norm: 90482.932 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2849/ 159576 | consumed samples: 45584 | elapsed time per iteration (ms): 13613.6 | learning rate: 1.263E-05 | global batch size: 16 | lm loss: 6.673044E+00 | loss scale: 16384.0 | grad norm: 59461.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2850/ 159576 | consumed samples: 45600 | elapsed time per iteration (ms): 13675.0 | learning rate: 1.263E-05 | global batch size: 16 | lm loss: 6.738005E+00 | loss scale: 16384.0 | grad norm: 101125.933 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2851/ 159576 | consumed samples: 45616 | elapsed time per iteration (ms): 13897.5 | learning rate: 1.263E-05 | global batch size: 16 | lm loss: 6.522173E+00 | loss scale: 16384.0 | grad norm: 90321.174 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2852/ 159576 | consumed samples: 45632 | elapsed time per iteration (ms): 13599.3 | learning rate: 1.264E-05 | global batch size: 16 | lm loss: 6.524035E+00 | loss scale: 16384.0 | grad norm: 70117.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2853/ 159576 | consumed samples: 45648 | elapsed time per iteration (ms): 13643.7 | learning rate: 1.264E-05 | global batch size: 16 | lm loss: 6.510409E+00 | loss scale: 16384.0 | grad norm: 64993.085 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2854/ 159576 | consumed samples: 45664 | elapsed time per iteration (ms): 13552.1 | learning rate: 1.265E-05 | global batch size: 16 | lm loss: 6.913634E+00 | loss scale: 16384.0 | grad norm: 106101.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2855/ 159576 | consumed samples: 45680 | elapsed time per iteration (ms): 13759.3 | learning rate: 1.265E-05 | global batch size: 16 | lm loss: 6.640407E+00 | loss scale: 16384.0 | grad norm: 114581.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2856/ 159576 | consumed samples: 45696 | elapsed time per iteration (ms): 13808.3 | learning rate: 1.266E-05 | global batch size: 16 | lm loss: 6.781041E+00 | loss scale: 16384.0 | grad norm: 56604.166 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2857/ 159576 | consumed samples: 45712 | elapsed time per iteration (ms): 13620.2 | learning rate: 1.266E-05 | global batch size: 16 | lm loss: 6.794811E+00 | loss scale: 16384.0 | grad norm: 60150.039 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2858/ 159576 | consumed samples: 45728 | elapsed time per iteration (ms): 13675.9 | learning rate: 1.267E-05 | global batch size: 16 | lm loss: 6.586791E+00 | loss scale: 16384.0 | grad norm: 100786.813 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2859/ 159576 | consumed samples: 45744 | elapsed time per iteration (ms): 13583.4 | learning rate: 1.267E-05 | global batch size: 16 | lm loss: 6.762810E+00 | loss scale: 16384.0 | grad norm: 82968.484 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2860/ 159576 | consumed samples: 45760 | elapsed time per iteration (ms): 13906.7 | learning rate: 1.267E-05 | global batch size: 16 | lm loss: 6.739496E+00 | loss scale: 16384.0 | grad norm: 51306.674 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2861/ 159576 | consumed samples: 45776 | elapsed time per iteration (ms): 13619.1 | learning rate: 1.268E-05 | global batch size: 16 | lm loss: 6.046006E+00 | loss scale: 16384.0 | grad norm: 70726.137 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2862/ 159576 | consumed samples: 45792 | elapsed time per iteration (ms): 13544.2 | learning rate: 1.268E-05 | global batch size: 16 | lm loss: 6.803837E+00 | loss scale: 16384.0 | grad norm: 68740.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2863/ 159576 | consumed samples: 45808 | elapsed time per iteration (ms): 13610.8 | learning rate: 1.269E-05 | global batch size: 16 | lm loss: 6.770112E+00 | loss scale: 16384.0 | grad norm: 139814.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2864/ 159576 | consumed samples: 45824 | elapsed time per iteration (ms): 13958.0 | learning rate: 1.269E-05 | global batch size: 16 | lm loss: 6.750904E+00 | loss scale: 16384.0 | grad norm: 77621.986 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2865/ 159576 | consumed samples: 45840 | elapsed time per iteration (ms): 13670.7 | learning rate: 1.270E-05 | global batch size: 16 | lm loss: 6.696413E+00 | loss scale: 16384.0 | grad norm: 71170.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2866/ 159576 | consumed samples: 45856 | elapsed time per iteration (ms): 13638.6 | learning rate: 1.270E-05 | global batch size: 16 | lm loss: 6.704915E+00 | loss scale: 16384.0 | grad norm: 101640.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2867/ 159576 | consumed samples: 45872 | elapsed time per iteration (ms): 13607.2 | learning rate: 1.271E-05 | global batch size: 16 | lm loss: 6.825719E+00 | loss scale: 16384.0 | grad norm: 75740.165 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2868/ 159576 | consumed samples: 45888 | elapsed time per iteration (ms): 13630.4 | learning rate: 1.271E-05 | global batch size: 16 | lm loss: 6.287379E+00 | loss scale: 16384.0 | grad norm: 102389.724 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2869/ 159576 | consumed samples: 45904 | elapsed time per iteration (ms): 13745.4 | learning rate: 1.271E-05 | global batch size: 16 | lm loss: 6.541815E+00 | loss scale: 16384.0 | grad norm: 70149.993 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2870/ 159576 | consumed samples: 45920 | elapsed time per iteration (ms): 13607.8 | learning rate: 1.272E-05 | global batch size: 16 | lm loss: 6.516257E+00 | loss scale: 16384.0 | grad norm: 75996.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2871/ 159576 | consumed samples: 45936 | elapsed time per iteration (ms): 13612.1 | learning rate: 1.272E-05 | global batch size: 16 | lm loss: 6.478125E+00 | loss scale: 16384.0 | grad norm: 71923.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2872/ 159576 | consumed samples: 45952 | elapsed time per iteration (ms): 13608.0 | learning rate: 1.273E-05 | global batch size: 16 | lm loss: 6.691109E+00 | loss scale: 16384.0 | grad norm: 87426.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2873/ 159576 | consumed samples: 45968 | elapsed time per iteration (ms): 13976.7 | learning rate: 1.273E-05 | global batch size: 16 | lm loss: 6.620930E+00 | loss scale: 16384.0 | grad norm: 104041.099 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2874/ 159576 | consumed samples: 45984 | elapsed time per iteration (ms): 13607.9 | learning rate: 1.274E-05 | global batch size: 16 | lm loss: 6.744573E+00 | loss scale: 16384.0 | grad norm: 69927.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2875/ 159576 | consumed samples: 46000 | elapsed time per iteration (ms): 13661.2 | learning rate: 1.274E-05 | global batch size: 16 | lm loss: 6.676423E+00 | loss scale: 16384.0 | grad norm: 51002.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2876/ 159576 | consumed samples: 46016 | elapsed time per iteration (ms): 13531.2 | learning rate: 1.275E-05 | global batch size: 16 | lm loss: 6.802640E+00 | loss scale: 16384.0 | grad norm: 87004.969 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2877/ 159576 | consumed samples: 46032 | elapsed time per iteration (ms): 13901.7 | learning rate: 1.275E-05 | global batch size: 16 | lm loss: 6.729659E+00 | loss scale: 16384.0 | grad norm: 50767.745 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2878/ 159576 | consumed samples: 46048 | elapsed time per iteration (ms): 13702.1 | learning rate: 1.275E-05 | global batch size: 16 | lm loss: 6.922673E+00 | loss scale: 16384.0 | grad norm: 121433.743 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2879/ 159576 | consumed samples: 46064 | elapsed time per iteration (ms): 13605.9 | learning rate: 1.276E-05 | global batch size: 16 | lm loss: 6.701990E+00 | loss scale: 16384.0 | grad norm: 78796.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2880/ 159576 | consumed samples: 46080 | elapsed time per iteration (ms): 13615.6 | learning rate: 1.276E-05 | global batch size: 16 | lm loss: 6.650718E+00 | loss scale: 16384.0 | grad norm: 68193.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2881/ 159576 | consumed samples: 46096 | elapsed time per iteration (ms): 13595.5 | learning rate: 1.277E-05 | global batch size: 16 | lm loss: 6.732479E+00 | loss scale: 16384.0 | grad norm: 69049.929 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2882/ 159576 | consumed samples: 46112 | elapsed time per iteration (ms): 13888.6 | learning rate: 1.277E-05 | global batch size: 16 | lm loss: 6.563155E+00 | loss scale: 16384.0 | grad norm: 84383.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2883/ 159576 | consumed samples: 46128 | elapsed time per iteration (ms): 13560.8 | learning rate: 1.278E-05 | global batch size: 16 | lm loss: 6.406487E+00 | loss scale: 16384.0 | grad norm: 66632.669 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2884/ 159576 | consumed samples: 46144 | elapsed time per iteration (ms): 13502.0 | learning rate: 1.278E-05 | global batch size: 16 | lm loss: 6.748409E+00 | loss scale: 16384.0 | grad norm: 69626.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2885/ 159576 | consumed samples: 46160 | elapsed time per iteration (ms): 13526.3 | learning rate: 1.279E-05 | global batch size: 16 | lm loss: 6.474768E+00 | loss scale: 16384.0 | grad norm: 43811.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2886/ 159576 | consumed samples: 46176 | elapsed time per iteration (ms): 13863.4 | learning rate: 1.279E-05 | global batch size: 16 | lm loss: 6.661960E+00 | loss scale: 16384.0 | grad norm: 71612.680 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2887/ 159576 | consumed samples: 46192 | elapsed time per iteration (ms): 13578.7 | learning rate: 1.279E-05 | global batch size: 16 | lm loss: 6.511534E+00 | loss scale: 16384.0 | grad norm: 60456.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2888/ 159576 | consumed samples: 46208 | elapsed time per iteration (ms): 13588.8 | learning rate: 1.280E-05 | global batch size: 16 | lm loss: 6.689698E+00 | loss scale: 16384.0 | grad norm: 101410.640 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2889/ 159576 | consumed samples: 46224 | elapsed time per iteration (ms): 13621.2 | learning rate: 1.280E-05 | global batch size: 16 | lm loss: 6.679986E+00 | loss scale: 16384.0 | grad norm: 74313.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2890/ 159576 | consumed samples: 46240 | elapsed time per iteration (ms): 13599.6 | learning rate: 1.281E-05 | global batch size: 16 | lm loss: 6.579202E+00 | loss scale: 16384.0 | grad norm: 53116.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2891/ 159576 | consumed samples: 46256 | elapsed time per iteration (ms): 13965.8 | learning rate: 1.281E-05 | global batch size: 16 | lm loss: 6.841757E+00 | loss scale: 16384.0 | grad norm: 71980.947 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2892/ 159576 | consumed samples: 46272 | elapsed time per iteration (ms): 13517.0 | learning rate: 1.282E-05 | global batch size: 16 | lm loss: 6.555973E+00 | loss scale: 16384.0 | grad norm: 90572.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2893/ 159576 | consumed samples: 46288 | elapsed time per iteration (ms): 13525.5 | learning rate: 1.282E-05 | global batch size: 16 | lm loss: 6.857316E+00 | loss scale: 16384.0 | grad norm: 60488.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2894/ 159576 | consumed samples: 46304 | elapsed time per iteration (ms): 13541.9 | learning rate: 1.283E-05 | global batch size: 16 | lm loss: 6.685534E+00 | loss scale: 16384.0 | grad norm: 69134.968 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2895/ 159576 | consumed samples: 46320 | elapsed time per iteration (ms): 14148.5 | learning rate: 1.283E-05 | global batch size: 16 | lm loss: 6.805571E+00 | loss scale: 16384.0 | grad norm: 57858.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2896/ 159576 | consumed samples: 46336 | elapsed time per iteration (ms): 13614.8 | learning rate: 1.283E-05 | global batch size: 16 | lm loss: 6.839938E+00 | loss scale: 16384.0 | grad norm: 146916.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2897/ 159576 | consumed samples: 46352 | elapsed time per iteration (ms): 13601.5 | learning rate: 1.284E-05 | global batch size: 16 | lm loss: 6.725083E+00 | loss scale: 16384.0 | grad norm: 101921.781 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2898/ 159576 | consumed samples: 46368 | elapsed time per iteration (ms): 13584.0 | learning rate: 1.284E-05 | global batch size: 16 | lm loss: 7.088351E+00 | loss scale: 16384.0 | grad norm: 78883.090 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2899/ 159576 | consumed samples: 46384 | elapsed time per iteration (ms): 14019.6 | learning rate: 1.285E-05 | global batch size: 16 | lm loss: 6.874489E+00 | loss scale: 16384.0 | grad norm: 79406.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2900/ 159576 | consumed samples: 46400 | elapsed time per iteration (ms): 13571.5 | learning rate: 1.285E-05 | global batch size: 16 | lm loss: 6.735637E+00 | loss scale: 16384.0 | grad norm: 58170.467 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2901/ 159576 | consumed samples: 46416 | elapsed time per iteration (ms): 13559.8 | learning rate: 1.286E-05 | global batch size: 16 | lm loss: 6.789194E+00 | loss scale: 16384.0 | grad norm: 153130.501 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2902/ 159576 | consumed samples: 46432 | elapsed time per iteration (ms): 13570.5 | learning rate: 1.286E-05 | global batch size: 16 | lm loss: 6.734316E+00 | loss scale: 16384.0 | grad norm: 116070.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2903/ 159576 | consumed samples: 46448 | elapsed time per iteration (ms): 13629.7 | learning rate: 1.287E-05 | global batch size: 16 | lm loss: 6.743185E+00 | loss scale: 16384.0 | grad norm: 76970.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2904/ 159576 | consumed samples: 46464 | elapsed time per iteration (ms): 13980.9 | learning rate: 1.287E-05 | global batch size: 16 | lm loss: 6.742231E+00 | loss scale: 16384.0 | grad norm: 79904.122 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2905/ 159576 | consumed samples: 46480 | elapsed time per iteration (ms): 13647.6 | learning rate: 1.287E-05 | global batch size: 16 | lm loss: 6.785865E+00 | loss scale: 16384.0 | grad norm: 66541.967 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2906/ 159576 | consumed samples: 46496 | elapsed time per iteration (ms): 13586.1 | learning rate: 1.288E-05 | global batch size: 16 | lm loss: 6.669911E+00 | loss scale: 16384.0 | grad norm: 76560.935 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2907/ 159576 | consumed samples: 46512 | elapsed time per iteration (ms): 13521.3 | learning rate: 1.288E-05 | global batch size: 16 | lm loss: 6.723244E+00 | loss scale: 16384.0 | grad norm: 103466.024 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2908/ 159576 | consumed samples: 46528 | elapsed time per iteration (ms): 13824.4 | learning rate: 1.289E-05 | global batch size: 16 | lm loss: 6.584032E+00 | loss scale: 16384.0 | grad norm: 73252.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2909/ 159576 | consumed samples: 46544 | elapsed time per iteration (ms): 13578.9 | learning rate: 1.289E-05 | global batch size: 16 | lm loss: 6.804316E+00 | loss scale: 16384.0 | grad norm: 70073.019 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2910/ 159576 | consumed samples: 46560 | elapsed time per iteration (ms): 13556.4 | learning rate: 1.290E-05 | global batch size: 16 | lm loss: 6.673604E+00 | loss scale: 16384.0 | grad norm: 109090.622 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2911/ 159576 | consumed samples: 46576 | elapsed time per iteration (ms): 13604.0 | learning rate: 1.290E-05 | global batch size: 16 | lm loss: 6.599095E+00 | loss scale: 16384.0 | grad norm: 57781.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2912/ 159576 | consumed samples: 46592 | elapsed time per iteration (ms): 13587.1 | learning rate: 1.291E-05 | global batch size: 16 | lm loss: 6.753370E+00 | loss scale: 16384.0 | grad norm: 76832.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2913/ 159576 | consumed samples: 46608 | elapsed time per iteration (ms): 13861.5 | learning rate: 1.291E-05 | global batch size: 16 | lm loss: 6.854298E+00 | loss scale: 16384.0 | grad norm: 72132.986 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2914/ 159576 | consumed samples: 46624 | elapsed time per iteration (ms): 13559.0 | learning rate: 1.291E-05 | global batch size: 16 | lm loss: 6.579864E+00 | loss scale: 16384.0 | grad norm: 74308.017 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2915/ 159576 | consumed samples: 46640 | elapsed time per iteration (ms): 13594.5 | learning rate: 1.292E-05 | global batch size: 16 | lm loss: 6.756865E+00 | loss scale: 16384.0 | grad norm: 54456.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2916/ 159576 | consumed samples: 46656 | elapsed time per iteration (ms): 13569.5 | learning rate: 1.292E-05 | global batch size: 16 | lm loss: 6.743901E+00 | loss scale: 16384.0 | grad norm: 55395.902 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2917/ 159576 | consumed samples: 46672 | elapsed time per iteration (ms): 13964.6 | learning rate: 1.293E-05 | global batch size: 16 | lm loss: 6.671132E+00 | loss scale: 16384.0 | grad norm: 82925.647 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2918/ 159576 | consumed samples: 46688 | elapsed time per iteration (ms): 13641.5 | learning rate: 1.293E-05 | global batch size: 16 | lm loss: 6.554927E+00 | loss scale: 16384.0 | grad norm: 64164.151 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2919/ 159576 | consumed samples: 46704 | elapsed time per iteration (ms): 13635.2 | learning rate: 1.294E-05 | global batch size: 16 | lm loss: 6.848719E+00 | loss scale: 16384.0 | grad norm: 67718.918 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2920/ 159576 | consumed samples: 46720 | elapsed time per iteration (ms): 13603.6 | learning rate: 1.294E-05 | global batch size: 16 | lm loss: 6.609835E+00 | loss scale: 16384.0 | grad norm: 64921.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2921/ 159576 | consumed samples: 46736 | elapsed time per iteration (ms): 13865.5 | learning rate: 1.295E-05 | global batch size: 16 | lm loss: 6.699195E+00 | loss scale: 16384.0 | grad norm: 76865.088 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2922/ 159576 | consumed samples: 46752 | elapsed time per iteration (ms): 13659.4 | learning rate: 1.295E-05 | global batch size: 16 | lm loss: 6.821632E+00 | loss scale: 16384.0 | grad norm: 105825.800 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2923/ 159576 | consumed samples: 46768 | elapsed time per iteration (ms): 13539.7 | learning rate: 1.295E-05 | global batch size: 16 | lm loss: 6.632296E+00 | loss scale: 16384.0 | grad norm: 85548.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2924/ 159576 | consumed samples: 46784 | elapsed time per iteration (ms): 13587.6 | learning rate: 1.296E-05 | global batch size: 16 | lm loss: 6.782111E+00 | loss scale: 16384.0 | grad norm: 64005.847 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2925/ 159576 | consumed samples: 46800 | elapsed time per iteration (ms): 13566.6 | learning rate: 1.296E-05 | global batch size: 16 | lm loss: 6.513734E+00 | loss scale: 16384.0 | grad norm: 74875.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2926/ 159576 | consumed samples: 46816 | elapsed time per iteration (ms): 13817.4 | learning rate: 1.297E-05 | global batch size: 16 | lm loss: 6.610899E+00 | loss scale: 16384.0 | grad norm: 69678.116 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2927/ 159576 | consumed samples: 46832 | elapsed time per iteration (ms): 13615.5 | learning rate: 1.297E-05 | global batch size: 16 | lm loss: 7.086233E+00 | loss scale: 16384.0 | grad norm: 70522.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2928/ 159576 | consumed samples: 46848 | elapsed time per iteration (ms): 13566.8 | learning rate: 1.298E-05 | global batch size: 16 | lm loss: 6.598146E+00 | loss scale: 16384.0 | grad norm: 103276.135 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2929/ 159576 | consumed samples: 46864 | elapsed time per iteration (ms): 13567.1 | learning rate: 1.298E-05 | global batch size: 16 | lm loss: 6.593244E+00 | loss scale: 16384.0 | grad norm: 78523.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2930/ 159576 | consumed samples: 46880 | elapsed time per iteration (ms): 13919.4 | learning rate: 1.299E-05 | global batch size: 16 | lm loss: 6.528622E+00 | loss scale: 16384.0 | grad norm: 82737.988 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2931/ 159576 | consumed samples: 46896 | elapsed time per iteration (ms): 13557.6 | learning rate: 1.299E-05 | global batch size: 16 | lm loss: 6.605000E+00 | loss scale: 16384.0 | grad norm: 68077.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2932/ 159576 | consumed samples: 46912 | elapsed time per iteration (ms): 13570.1 | learning rate: 1.299E-05 | global batch size: 16 | lm loss: 6.595417E+00 | loss scale: 16384.0 | grad norm: 84602.521 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2933/ 159576 | consumed samples: 46928 | elapsed time per iteration (ms): 13606.8 | learning rate: 1.300E-05 | global batch size: 16 | lm loss: 6.730010E+00 | loss scale: 16384.0 | grad norm: 85745.847 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2934/ 159576 | consumed samples: 46944 | elapsed time per iteration (ms): 13584.8 | learning rate: 1.300E-05 | global batch size: 16 | lm loss: 6.689770E+00 | loss scale: 16384.0 | grad norm: 62655.073 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2935/ 159576 | consumed samples: 46960 | elapsed time per iteration (ms): 14053.4 | learning rate: 1.301E-05 | global batch size: 16 | lm loss: 6.715128E+00 | loss scale: 16384.0 | grad norm: 65695.924 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2936/ 159576 | consumed samples: 46976 | elapsed time per iteration (ms): 13589.9 | learning rate: 1.301E-05 | global batch size: 16 | lm loss: 6.651369E+00 | loss scale: 16384.0 | grad norm: 55322.170 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2937/ 159576 | consumed samples: 46992 | elapsed time per iteration (ms): 13553.6 | learning rate: 1.302E-05 | global batch size: 16 | lm loss: 6.646598E+00 | loss scale: 16384.0 | grad norm: 105686.832 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2938/ 159576 | consumed samples: 47008 | elapsed time per iteration (ms): 13584.5 | learning rate: 1.302E-05 | global batch size: 16 | lm loss: 6.798124E+00 | loss scale: 16384.0 | grad norm: 62478.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2939/ 159576 | consumed samples: 47024 | elapsed time per iteration (ms): 13902.5 | learning rate: 1.303E-05 | global batch size: 16 | lm loss: 6.594469E+00 | loss scale: 16384.0 | grad norm: 66128.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2940/ 159576 | consumed samples: 47040 | elapsed time per iteration (ms): 13632.4 | learning rate: 1.303E-05 | global batch size: 16 | lm loss: 6.642596E+00 | loss scale: 16384.0 | grad norm: 70291.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2941/ 159576 | consumed samples: 47056 | elapsed time per iteration (ms): 13595.9 | learning rate: 1.303E-05 | global batch size: 16 | lm loss: 6.428228E+00 | loss scale: 16384.0 | grad norm: 88273.080 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2942/ 159576 | consumed samples: 47072 | elapsed time per iteration (ms): 13622.0 | learning rate: 1.304E-05 | global batch size: 16 | lm loss: 6.776118E+00 | loss scale: 16384.0 | grad norm: 66140.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2943/ 159576 | consumed samples: 47088 | elapsed time per iteration (ms): 13949.2 | learning rate: 1.304E-05 | global batch size: 16 | lm loss: 6.678353E+00 | loss scale: 16384.0 | grad norm: 68411.066 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2944/ 159576 | consumed samples: 47104 | elapsed time per iteration (ms): 13581.2 | learning rate: 1.305E-05 | global batch size: 16 | lm loss: 6.679141E+00 | loss scale: 16384.0 | grad norm: 85622.880 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2945/ 159576 | consumed samples: 47120 | elapsed time per iteration (ms): 13544.3 | learning rate: 1.305E-05 | global batch size: 16 | lm loss: 6.620451E+00 | loss scale: 16384.0 | grad norm: 62226.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2946/ 159576 | consumed samples: 47136 | elapsed time per iteration (ms): 13593.9 | learning rate: 1.306E-05 | global batch size: 16 | lm loss: 6.719603E+00 | loss scale: 16384.0 | grad norm: 90885.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2947/ 159576 | consumed samples: 47152 | elapsed time per iteration (ms): 13604.3 | learning rate: 1.306E-05 | global batch size: 16 | lm loss: 6.704114E+00 | loss scale: 16384.0 | grad norm: 67182.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2948/ 159576 | consumed samples: 47168 | elapsed time per iteration (ms): 13746.5 | learning rate: 1.307E-05 | global batch size: 16 | lm loss: 6.781267E+00 | loss scale: 16384.0 | grad norm: 85616.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2949/ 159576 | consumed samples: 47184 | elapsed time per iteration (ms): 13612.1 | learning rate: 1.307E-05 | global batch size: 16 | lm loss: 6.878286E+00 | loss scale: 16384.0 | grad norm: 83807.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2950/ 159576 | consumed samples: 47200 | elapsed time per iteration (ms): 13656.8 | learning rate: 1.307E-05 | global batch size: 16 | lm loss: 6.808831E+00 | loss scale: 16384.0 | grad norm: 99669.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2951/ 159576 | consumed samples: 47216 | elapsed time per iteration (ms): 13662.4 | learning rate: 1.308E-05 | global batch size: 16 | lm loss: 6.751644E+00 | loss scale: 16384.0 | grad norm: 60477.798 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2952/ 159576 | consumed samples: 47232 | elapsed time per iteration (ms): 13999.0 | learning rate: 1.308E-05 | global batch size: 16 | lm loss: 6.593210E+00 | loss scale: 16384.0 | grad norm: 72293.070 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2953/ 159576 | consumed samples: 47248 | elapsed time per iteration (ms): 13609.1 | learning rate: 1.309E-05 | global batch size: 16 | lm loss: 6.662547E+00 | loss scale: 16384.0 | grad norm: 49910.061 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2954/ 159576 | consumed samples: 47280 | elapsed time per iteration (ms): 14635.0 | learning rate: 1.310E-05 | global batch size: 32 | lm loss: 6.688079E+00 | loss scale: 16384.0 | grad norm: 111598.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2955/ 159576 | consumed samples: 47312 | elapsed time per iteration (ms): 14591.8 | learning rate: 1.311E-05 | global batch size: 32 | lm loss: 6.657289E+00 | loss scale: 16384.0 | grad norm: 67597.793 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2956/ 159576 | consumed samples: 47344 | elapsed time per iteration (ms): 15030.0 | learning rate: 1.311E-05 | global batch size: 32 | lm loss: 6.554570E+00 | loss scale: 16384.0 | grad norm: 69780.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2957/ 159576 | consumed samples: 47376 | elapsed time per iteration (ms): 14563.7 | learning rate: 1.312E-05 | global batch size: 32 | lm loss: 6.741304E+00 | loss scale: 16384.0 | grad norm: 58633.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2958/ 159576 | consumed samples: 47408 | elapsed time per iteration (ms): 14589.9 | learning rate: 1.313E-05 | global batch size: 32 | lm loss: 6.601515E+00 | loss scale: 16384.0 | grad norm: 107295.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2959/ 159576 | consumed samples: 47440 | elapsed time per iteration (ms): 14625.1 | learning rate: 1.314E-05 | global batch size: 32 | lm loss: 6.683945E+00 | loss scale: 16384.0 | grad norm: 81347.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2960/ 159576 | consumed samples: 47472 | elapsed time per iteration (ms): 14964.2 | learning rate: 1.315E-05 | global batch size: 32 | lm loss: 6.790781E+00 | loss scale: 16384.0 | grad norm: 77191.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2961/ 159576 | consumed samples: 47504 | elapsed time per iteration (ms): 14557.0 | learning rate: 1.316E-05 | global batch size: 32 | lm loss: 6.749201E+00 | loss scale: 16384.0 | grad norm: 82408.963 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2962/ 159576 | consumed samples: 47536 | elapsed time per iteration (ms): 14666.5 | learning rate: 1.317E-05 | global batch size: 32 | lm loss: 6.532114E+00 | loss scale: 16384.0 | grad norm: 51870.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2963/ 159576 | consumed samples: 47568 | elapsed time per iteration (ms): 14537.9 | learning rate: 1.318E-05 | global batch size: 32 | lm loss: 6.660976E+00 | loss scale: 16384.0 | grad norm: 66392.838 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2964/ 159576 | consumed samples: 47600 | elapsed time per iteration (ms): 15078.8 | learning rate: 1.318E-05 | global batch size: 32 | lm loss: 6.526144E+00 | loss scale: 16384.0 | grad norm: 54716.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2965/ 159576 | consumed samples: 47632 | elapsed time per iteration (ms): 14737.9 | learning rate: 1.319E-05 | global batch size: 32 | lm loss: 6.649373E+00 | loss scale: 16384.0 | grad norm: 51359.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2966/ 159576 | consumed samples: 47664 | elapsed time per iteration (ms): 14559.9 | learning rate: 1.320E-05 | global batch size: 32 | lm loss: 6.672748E+00 | loss scale: 16384.0 | grad norm: 73789.982 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2967/ 159576 | consumed samples: 47696 | elapsed time per iteration (ms): 14642.3 | learning rate: 1.321E-05 | global batch size: 32 | lm loss: 6.662704E+00 | loss scale: 16384.0 | grad norm: 66303.151 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2968/ 159576 | consumed samples: 47728 | elapsed time per iteration (ms): 14852.7 | learning rate: 1.322E-05 | global batch size: 32 | lm loss: 6.624488E+00 | loss scale: 16384.0 | grad norm: 59052.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2969/ 159576 | consumed samples: 47760 | elapsed time per iteration (ms): 14836.6 | learning rate: 1.323E-05 | global batch size: 32 | lm loss: 6.600084E+00 | loss scale: 16384.0 | grad norm: 62547.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2970/ 159576 | consumed samples: 47792 | elapsed time per iteration (ms): 14593.7 | learning rate: 1.324E-05 | global batch size: 32 | lm loss: 6.517389E+00 | loss scale: 16384.0 | grad norm: 60694.546 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2971/ 159576 | consumed samples: 47824 | elapsed time per iteration (ms): 14618.4 | learning rate: 1.325E-05 | global batch size: 32 | lm loss: 6.548014E+00 | loss scale: 16384.0 | grad norm: 43913.010 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2972/ 159576 | consumed samples: 47856 | elapsed time per iteration (ms): 14695.6 | learning rate: 1.326E-05 | global batch size: 32 | lm loss: 6.593935E+00 | loss scale: 16384.0 | grad norm: 63488.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2973/ 159576 | consumed samples: 47888 | elapsed time per iteration (ms): 14827.1 | learning rate: 1.326E-05 | global batch size: 32 | lm loss: 6.572222E+00 | loss scale: 16384.0 | grad norm: 54368.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2974/ 159576 | consumed samples: 47920 | elapsed time per iteration (ms): 14620.6 | learning rate: 1.327E-05 | global batch size: 32 | lm loss: 6.550548E+00 | loss scale: 16384.0 | grad norm: 87940.074 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2975/ 159576 | consumed samples: 47952 | elapsed time per iteration (ms): 14622.4 | learning rate: 1.328E-05 | global batch size: 32 | lm loss: 6.529421E+00 | loss scale: 16384.0 | grad norm: 60145.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2976/ 159576 | consumed samples: 47984 | elapsed time per iteration (ms): 14586.4 | learning rate: 1.329E-05 | global batch size: 32 | lm loss: 6.765855E+00 | loss scale: 16384.0 | grad norm: 83899.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2977/ 159576 | consumed samples: 48016 | elapsed time per iteration (ms): 14810.9 | learning rate: 1.330E-05 | global batch size: 32 | lm loss: 6.630699E+00 | loss scale: 16384.0 | grad norm: 44149.072 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2978/ 159576 | consumed samples: 48048 | elapsed time per iteration (ms): 14685.4 | learning rate: 1.331E-05 | global batch size: 32 | lm loss: 6.561995E+00 | loss scale: 16384.0 | grad norm: 87446.523 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2979/ 159576 | consumed samples: 48080 | elapsed time per iteration (ms): 14648.9 | learning rate: 1.332E-05 | global batch size: 32 | lm loss: 6.467924E+00 | loss scale: 16384.0 | grad norm: 65034.519 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2980/ 159576 | consumed samples: 48112 | elapsed time per iteration (ms): 14615.3 | learning rate: 1.333E-05 | global batch size: 32 | lm loss: 6.649030E+00 | loss scale: 16384.0 | grad norm: 92148.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2981/ 159576 | consumed samples: 48144 | elapsed time per iteration (ms): 14681.7 | learning rate: 1.334E-05 | global batch size: 32 | lm loss: 6.749784E+00 | loss scale: 16384.0 | grad norm: 61670.963 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2982/ 159576 | consumed samples: 48176 | elapsed time per iteration (ms): 14509.6 | learning rate: 1.334E-05 | global batch size: 32 | lm loss: 6.567672E+00 | loss scale: 16384.0 | grad norm: 79628.022 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2983/ 159576 | consumed samples: 48208 | elapsed time per iteration (ms): 14555.2 | learning rate: 1.335E-05 | global batch size: 32 | lm loss: 6.676024E+00 | loss scale: 16384.0 | grad norm: 65136.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2984/ 159576 | consumed samples: 48240 | elapsed time per iteration (ms): 14572.2 | learning rate: 1.336E-05 | global batch size: 32 | lm loss: 6.467518E+00 | loss scale: 16384.0 | grad norm: 90637.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2985/ 159576 | consumed samples: 48272 | elapsed time per iteration (ms): 14888.7 | learning rate: 1.337E-05 | global batch size: 32 | lm loss: 6.586103E+00 | loss scale: 16384.0 | grad norm: 81306.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2986/ 159576 | consumed samples: 48304 | elapsed time per iteration (ms): 14588.0 | learning rate: 1.338E-05 | global batch size: 32 | lm loss: 6.541125E+00 | loss scale: 16384.0 | grad norm: 62368.768 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2987/ 159576 | consumed samples: 48336 | elapsed time per iteration (ms): 14597.9 | learning rate: 1.339E-05 | global batch size: 32 | lm loss: 6.591407E+00 | loss scale: 16384.0 | grad norm: 87504.003 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2988/ 159576 | consumed samples: 48368 | elapsed time per iteration (ms): 14590.3 | learning rate: 1.340E-05 | global batch size: 32 | lm loss: 6.678365E+00 | loss scale: 16384.0 | grad norm: 78293.170 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2989/ 159576 | consumed samples: 48400 | elapsed time per iteration (ms): 15031.9 | learning rate: 1.341E-05 | global batch size: 32 | lm loss: 6.564939E+00 | loss scale: 16384.0 | grad norm: 77173.924 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2990/ 159576 | consumed samples: 48432 | elapsed time per iteration (ms): 14705.4 | learning rate: 1.342E-05 | global batch size: 32 | lm loss: 6.692814E+00 | loss scale: 16384.0 | grad norm: 57544.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2991/ 159576 | consumed samples: 48464 | elapsed time per iteration (ms): 14586.3 | learning rate: 1.342E-05 | global batch size: 32 | lm loss: 6.628499E+00 | loss scale: 16384.0 | grad norm: 75164.585 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2992/ 159576 | consumed samples: 48496 | elapsed time per iteration (ms): 14624.5 | learning rate: 1.343E-05 | global batch size: 32 | lm loss: 6.582328E+00 | loss scale: 16384.0 | grad norm: 79666.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2993/ 159576 | consumed samples: 48528 | elapsed time per iteration (ms): 14950.3 | learning rate: 1.344E-05 | global batch size: 32 | lm loss: 6.558386E+00 | loss scale: 16384.0 | grad norm: 55234.958 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2994/ 159576 | consumed samples: 48560 | elapsed time per iteration (ms): 14695.8 | learning rate: 1.345E-05 | global batch size: 32 | lm loss: 6.676173E+00 | loss scale: 16384.0 | grad norm: 99524.065 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2995/ 159576 | consumed samples: 48592 | elapsed time per iteration (ms): 14559.9 | learning rate: 1.346E-05 | global batch size: 32 | lm loss: 6.529976E+00 | loss scale: 16384.0 | grad norm: 70438.746 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2996/ 159576 | consumed samples: 48624 | elapsed time per iteration (ms): 14633.0 | learning rate: 1.347E-05 | global batch size: 32 | lm loss: 6.703721E+00 | loss scale: 16384.0 | grad norm: 67881.530 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2997/ 159576 | consumed samples: 48656 | elapsed time per iteration (ms): 14953.9 | learning rate: 1.348E-05 | global batch size: 32 | lm loss: 6.681076E+00 | loss scale: 16384.0 | grad norm: 65351.573 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2998/ 159576 | consumed samples: 48688 | elapsed time per iteration (ms): 14564.8 | learning rate: 1.349E-05 | global batch size: 32 | lm loss: 6.741710E+00 | loss scale: 16384.0 | grad norm: 80607.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 2999/ 159576 | consumed samples: 48720 | elapsed time per iteration (ms): 14589.6 | learning rate: 1.350E-05 | global batch size: 32 | lm loss: 6.628071E+00 | loss scale: 16384.0 | grad norm: 80325.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3000/ 159576 | consumed samples: 48752 | elapsed time per iteration (ms): 14580.8 | learning rate: 1.350E-05 | global batch size: 32 | lm loss: 6.770764E+00 | loss scale: 16384.0 | grad norm: 62881.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------ validation loss at iteration 3000 | lm loss value: 6.650529E+00 | lm loss PPL: 7.731932E+02 | ------------------------------------------------------------------------------------------------ saving checkpoint at iteration 3000 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-24 13:45:23,476] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step3000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 3000 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 19970.36 iteration 3001/ 159576 | consumed samples: 48784 | elapsed time per iteration (ms): 40923.5 | learning rate: 1.351E-05 | global batch size: 32 | lm loss: 6.512466E+00 | loss scale: 16384.0 | grad norm: 78526.093 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3002/ 159576 | consumed samples: 48816 | elapsed time per iteration (ms): 14454.5 | learning rate: 1.352E-05 | global batch size: 32 | lm loss: 6.725769E+00 | loss scale: 16384.0 | grad norm: 52532.916 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3003/ 159576 | consumed samples: 48848 | elapsed time per iteration (ms): 14508.9 | learning rate: 1.353E-05 | global batch size: 32 | lm loss: 6.778904E+00 | loss scale: 16384.0 | grad norm: 61815.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3004/ 159576 | consumed samples: 48880 | elapsed time per iteration (ms): 14774.8 | learning rate: 1.354E-05 | global batch size: 32 | lm loss: 6.600959E+00 | loss scale: 16384.0 | grad norm: 72563.840 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3005/ 159576 | consumed samples: 48912 | elapsed time per iteration (ms): 14543.7 | learning rate: 1.355E-05 | global batch size: 32 | lm loss: 6.630536E+00 | loss scale: 16384.0 | grad norm: 52120.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3006/ 159576 | consumed samples: 48944 | elapsed time per iteration (ms): 14501.8 | learning rate: 1.356E-05 | global batch size: 32 | lm loss: 6.661976E+00 | loss scale: 16384.0 | grad norm: 60799.900 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3007/ 159576 | consumed samples: 48976 | elapsed time per iteration (ms): 14465.0 | learning rate: 1.357E-05 | global batch size: 32 | lm loss: 6.695879E+00 | loss scale: 16384.0 | grad norm: 55470.787 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3008/ 159576 | consumed samples: 49008 | elapsed time per iteration (ms): 14696.5 | learning rate: 1.358E-05 | global batch size: 32 | lm loss: 6.613426E+00 | loss scale: 16384.0 | grad norm: 80502.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3009/ 159576 | consumed samples: 49040 | elapsed time per iteration (ms): 14441.9 | learning rate: 1.358E-05 | global batch size: 32 | lm loss: 6.640174E+00 | loss scale: 16384.0 | grad norm: 53100.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3010/ 159576 | consumed samples: 49072 | elapsed time per iteration (ms): 14484.3 | learning rate: 1.359E-05 | global batch size: 32 | lm loss: 6.660203E+00 | loss scale: 16384.0 | grad norm: 69573.492 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3011/ 159576 | consumed samples: 49104 | elapsed time per iteration (ms): 14599.1 | learning rate: 1.360E-05 | global batch size: 32 | lm loss: 6.674448E+00 | loss scale: 16384.0 | grad norm: 49737.183 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3012/ 159576 | consumed samples: 49136 | elapsed time per iteration (ms): 14701.4 | learning rate: 1.361E-05 | global batch size: 32 | lm loss: 6.607582E+00 | loss scale: 16384.0 | grad norm: 121923.648 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3013/ 159576 | consumed samples: 49168 | elapsed time per iteration (ms): 14527.2 | learning rate: 1.362E-05 | global batch size: 32 | lm loss: 6.552118E+00 | loss scale: 16384.0 | grad norm: 86117.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3014/ 159576 | consumed samples: 49200 | elapsed time per iteration (ms): 14528.7 | learning rate: 1.363E-05 | global batch size: 32 | lm loss: 6.628557E+00 | loss scale: 16384.0 | grad norm: 65341.685 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3015/ 159576 | consumed samples: 49232 | elapsed time per iteration (ms): 14528.2 | learning rate: 1.364E-05 | global batch size: 32 | lm loss: 6.637073E+00 | loss scale: 16384.0 | grad norm: 56388.918 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3016/ 159576 | consumed samples: 49264 | elapsed time per iteration (ms): 14818.6 | learning rate: 1.365E-05 | global batch size: 32 | lm loss: 6.643037E+00 | loss scale: 16384.0 | grad norm: 92476.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3017/ 159576 | consumed samples: 49296 | elapsed time per iteration (ms): 14532.4 | learning rate: 1.366E-05 | global batch size: 32 | lm loss: 6.517512E+00 | loss scale: 16384.0 | grad norm: 69528.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3018/ 159576 | consumed samples: 49328 | elapsed time per iteration (ms): 14482.9 | learning rate: 1.366E-05 | global batch size: 32 | lm loss: 6.593336E+00 | loss scale: 16384.0 | grad norm: 58227.816 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3019/ 159576 | consumed samples: 49360 | elapsed time per iteration (ms): 14483.3 | learning rate: 1.367E-05 | global batch size: 32 | lm loss: 6.682046E+00 | loss scale: 16384.0 | grad norm: 77807.619 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3020/ 159576 | consumed samples: 49392 | elapsed time per iteration (ms): 15039.4 | learning rate: 1.368E-05 | global batch size: 32 | lm loss: 6.511760E+00 | loss scale: 16384.0 | grad norm: 61711.980 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3021/ 159576 | consumed samples: 49424 | elapsed time per iteration (ms): 14532.3 | learning rate: 1.369E-05 | global batch size: 32 | lm loss: 6.601027E+00 | loss scale: 16384.0 | grad norm: 59045.542 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3022/ 159576 | consumed samples: 49456 | elapsed time per iteration (ms): 14411.9 | learning rate: 1.370E-05 | global batch size: 32 | lm loss: 6.669757E+00 | loss scale: 16384.0 | grad norm: 79072.886 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3023/ 159576 | consumed samples: 49488 | elapsed time per iteration (ms): 14433.5 | learning rate: 1.371E-05 | global batch size: 32 | lm loss: 6.660283E+00 | loss scale: 16384.0 | grad norm: 83581.808 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3024/ 159576 | consumed samples: 49520 | elapsed time per iteration (ms): 14915.2 | learning rate: 1.372E-05 | global batch size: 32 | lm loss: 6.621551E+00 | loss scale: 16384.0 | grad norm: 64854.144 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3025/ 159576 | consumed samples: 49552 | elapsed time per iteration (ms): 14425.9 | learning rate: 1.373E-05 | global batch size: 32 | lm loss: 6.591113E+00 | loss scale: 16384.0 | grad norm: 52620.079 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3026/ 159576 | consumed samples: 49584 | elapsed time per iteration (ms): 14542.0 | learning rate: 1.374E-05 | global batch size: 32 | lm loss: 6.659728E+00 | loss scale: 16384.0 | grad norm: 50471.019 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3027/ 159576 | consumed samples: 49616 | elapsed time per iteration (ms): 14493.7 | learning rate: 1.374E-05 | global batch size: 32 | lm loss: 6.786015E+00 | loss scale: 16384.0 | grad norm: 89599.838 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3028/ 159576 | consumed samples: 49648 | elapsed time per iteration (ms): 14955.9 | learning rate: 1.375E-05 | global batch size: 32 | lm loss: 6.515626E+00 | loss scale: 16384.0 | grad norm: 71757.893 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3029/ 159576 | consumed samples: 49680 | elapsed time per iteration (ms): 14451.8 | learning rate: 1.376E-05 | global batch size: 32 | lm loss: 6.552487E+00 | loss scale: 16384.0 | grad norm: 59493.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3030/ 159576 | consumed samples: 49712 | elapsed time per iteration (ms): 14565.2 | learning rate: 1.377E-05 | global batch size: 32 | lm loss: 6.515723E+00 | loss scale: 16384.0 | grad norm: 70621.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3031/ 159576 | consumed samples: 49744 | elapsed time per iteration (ms): 14573.9 | learning rate: 1.378E-05 | global batch size: 32 | lm loss: 6.533678E+00 | loss scale: 16384.0 | grad norm: 67416.578 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3032/ 159576 | consumed samples: 49776 | elapsed time per iteration (ms): 14838.7 | learning rate: 1.379E-05 | global batch size: 32 | lm loss: 6.558086E+00 | loss scale: 16384.0 | grad norm: 57733.715 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3033/ 159576 | consumed samples: 49808 | elapsed time per iteration (ms): 14602.8 | learning rate: 1.380E-05 | global batch size: 32 | lm loss: 6.520467E+00 | loss scale: 16384.0 | grad norm: 82103.090 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3034/ 159576 | consumed samples: 49840 | elapsed time per iteration (ms): 14562.2 | learning rate: 1.381E-05 | global batch size: 32 | lm loss: 6.583010E+00 | loss scale: 16384.0 | grad norm: 49461.985 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3035/ 159576 | consumed samples: 49872 | elapsed time per iteration (ms): 14551.2 | learning rate: 1.382E-05 | global batch size: 32 | lm loss: 6.614191E+00 | loss scale: 16384.0 | grad norm: 42934.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3036/ 159576 | consumed samples: 49904 | elapsed time per iteration (ms): 15033.1 | learning rate: 1.382E-05 | global batch size: 32 | lm loss: 6.646058E+00 | loss scale: 16384.0 | grad norm: 72475.817 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3037/ 159576 | consumed samples: 49936 | elapsed time per iteration (ms): 14506.7 | learning rate: 1.383E-05 | global batch size: 32 | lm loss: 6.657450E+00 | loss scale: 16384.0 | grad norm: 51862.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3038/ 159576 | consumed samples: 49968 | elapsed time per iteration (ms): 14535.4 | learning rate: 1.384E-05 | global batch size: 32 | lm loss: 6.474831E+00 | loss scale: 16384.0 | grad norm: 54826.780 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3039/ 159576 | consumed samples: 50000 | elapsed time per iteration (ms): 14517.2 | learning rate: 1.385E-05 | global batch size: 32 | lm loss: 6.491888E+00 | loss scale: 16384.0 | grad norm: 48045.161 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3040/ 159576 | consumed samples: 50032 | elapsed time per iteration (ms): 14679.0 | learning rate: 1.386E-05 | global batch size: 32 | lm loss: 6.557182E+00 | loss scale: 16384.0 | grad norm: 79148.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3041/ 159576 | consumed samples: 50064 | elapsed time per iteration (ms): 14829.2 | learning rate: 1.387E-05 | global batch size: 32 | lm loss: 6.624621E+00 | loss scale: 16384.0 | grad norm: 50930.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3042/ 159576 | consumed samples: 50096 | elapsed time per iteration (ms): 14560.9 | learning rate: 1.388E-05 | global batch size: 32 | lm loss: 6.572658E+00 | loss scale: 16384.0 | grad norm: 72539.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3043/ 159576 | consumed samples: 50128 | elapsed time per iteration (ms): 14616.0 | learning rate: 1.389E-05 | global batch size: 32 | lm loss: 6.654581E+00 | loss scale: 16384.0 | grad norm: 66089.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3044/ 159576 | consumed samples: 50160 | elapsed time per iteration (ms): 14597.6 | learning rate: 1.389E-05 | global batch size: 32 | lm loss: 6.568760E+00 | loss scale: 16384.0 | grad norm: 77389.521 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3045/ 159576 | consumed samples: 50192 | elapsed time per iteration (ms): 14717.8 | learning rate: 1.390E-05 | global batch size: 32 | lm loss: 6.562954E+00 | loss scale: 16384.0 | grad norm: 59175.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3046/ 159576 | consumed samples: 50224 | elapsed time per iteration (ms): 14549.8 | learning rate: 1.391E-05 | global batch size: 32 | lm loss: 6.519083E+00 | loss scale: 16384.0 | grad norm: 72573.917 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3047/ 159576 | consumed samples: 50256 | elapsed time per iteration (ms): 14547.8 | learning rate: 1.392E-05 | global batch size: 32 | lm loss: 6.586189E+00 | loss scale: 16384.0 | grad norm: 63454.734 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3048/ 159576 | consumed samples: 50288 | elapsed time per iteration (ms): 14699.8 | learning rate: 1.393E-05 | global batch size: 32 | lm loss: 6.629214E+00 | loss scale: 16384.0 | grad norm: 49137.706 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3049/ 159576 | consumed samples: 50320 | elapsed time per iteration (ms): 14760.5 | learning rate: 1.394E-05 | global batch size: 32 | lm loss: 6.567476E+00 | loss scale: 16384.0 | grad norm: 59423.684 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3050/ 159576 | consumed samples: 50352 | elapsed time per iteration (ms): 14605.2 | learning rate: 1.395E-05 | global batch size: 32 | lm loss: 6.560441E+00 | loss scale: 16384.0 | grad norm: 76106.960 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3051/ 159576 | consumed samples: 50384 | elapsed time per iteration (ms): 14589.0 | learning rate: 1.396E-05 | global batch size: 32 | lm loss: 6.676329E+00 | loss scale: 16384.0 | grad norm: 43490.816 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3052/ 159576 | consumed samples: 50416 | elapsed time per iteration (ms): 14546.5 | learning rate: 1.397E-05 | global batch size: 32 | lm loss: 6.531154E+00 | loss scale: 16384.0 | grad norm: 77324.537 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3053/ 159576 | consumed samples: 50448 | elapsed time per iteration (ms): 14689.5 | learning rate: 1.397E-05 | global batch size: 32 | lm loss: 6.457368E+00 | loss scale: 16384.0 | grad norm: 61005.030 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3054/ 159576 | consumed samples: 50480 | elapsed time per iteration (ms): 14604.5 | learning rate: 1.398E-05 | global batch size: 32 | lm loss: 6.694659E+00 | loss scale: 16384.0 | grad norm: 50570.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3055/ 159576 | consumed samples: 50512 | elapsed time per iteration (ms): 14507.3 | learning rate: 1.399E-05 | global batch size: 32 | lm loss: 6.639795E+00 | loss scale: 16384.0 | grad norm: 57017.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3056/ 159576 | consumed samples: 50544 | elapsed time per iteration (ms): 14581.4 | learning rate: 1.400E-05 | global batch size: 32 | lm loss: 6.619573E+00 | loss scale: 16384.0 | grad norm: 60323.172 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3057/ 159576 | consumed samples: 50576 | elapsed time per iteration (ms): 15078.3 | learning rate: 1.401E-05 | global batch size: 32 | lm loss: 6.636419E+00 | loss scale: 16384.0 | grad norm: 49598.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3058/ 159576 | consumed samples: 50608 | elapsed time per iteration (ms): 14576.1 | learning rate: 1.402E-05 | global batch size: 32 | lm loss: 6.591126E+00 | loss scale: 16384.0 | grad norm: 102052.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3059/ 159576 | consumed samples: 50640 | elapsed time per iteration (ms): 14515.1 | learning rate: 1.403E-05 | global batch size: 32 | lm loss: 6.500241E+00 | loss scale: 16384.0 | grad norm: 52981.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3060/ 159576 | consumed samples: 50672 | elapsed time per iteration (ms): 14582.7 | learning rate: 1.404E-05 | global batch size: 32 | lm loss: 6.553960E+00 | loss scale: 16384.0 | grad norm: 57341.020 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3061/ 159576 | consumed samples: 50704 | elapsed time per iteration (ms): 14939.5 | learning rate: 1.405E-05 | global batch size: 32 | lm loss: 6.593186E+00 | loss scale: 16384.0 | grad norm: 50198.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3062/ 159576 | consumed samples: 50736 | elapsed time per iteration (ms): 14545.7 | learning rate: 1.405E-05 | global batch size: 32 | lm loss: 6.577888E+00 | loss scale: 16384.0 | grad norm: 90008.008 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3063/ 159576 | consumed samples: 50768 | elapsed time per iteration (ms): 14515.8 | learning rate: 1.406E-05 | global batch size: 32 | lm loss: 6.775355E+00 | loss scale: 16384.0 | grad norm: 52343.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3064/ 159576 | consumed samples: 50800 | elapsed time per iteration (ms): 14570.2 | learning rate: 1.407E-05 | global batch size: 32 | lm loss: 6.724249E+00 | loss scale: 16384.0 | grad norm: 69939.860 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3065/ 159576 | consumed samples: 50832 | elapsed time per iteration (ms): 14913.0 | learning rate: 1.408E-05 | global batch size: 32 | lm loss: 6.634195E+00 | loss scale: 16384.0 | grad norm: 70070.520 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3066/ 159576 | consumed samples: 50864 | elapsed time per iteration (ms): 14497.8 | learning rate: 1.409E-05 | global batch size: 32 | lm loss: 6.591150E+00 | loss scale: 16384.0 | grad norm: 80109.931 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3067/ 159576 | consumed samples: 50896 | elapsed time per iteration (ms): 14593.4 | learning rate: 1.410E-05 | global batch size: 32 | lm loss: 6.637640E+00 | loss scale: 16384.0 | grad norm: 51104.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3068/ 159576 | consumed samples: 50928 | elapsed time per iteration (ms): 14459.7 | learning rate: 1.411E-05 | global batch size: 32 | lm loss: 6.595787E+00 | loss scale: 16384.0 | grad norm: 49458.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3069/ 159576 | consumed samples: 50960 | elapsed time per iteration (ms): 14904.6 | learning rate: 1.412E-05 | global batch size: 32 | lm loss: 6.762650E+00 | loss scale: 16384.0 | grad norm: 88087.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3070/ 159576 | consumed samples: 50992 | elapsed time per iteration (ms): 14578.7 | learning rate: 1.413E-05 | global batch size: 32 | lm loss: 6.615232E+00 | loss scale: 16384.0 | grad norm: 50851.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3071/ 159576 | consumed samples: 51024 | elapsed time per iteration (ms): 14534.9 | learning rate: 1.413E-05 | global batch size: 32 | lm loss: 6.502337E+00 | loss scale: 16384.0 | grad norm: 82199.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3072/ 159576 | consumed samples: 51056 | elapsed time per iteration (ms): 14555.3 | learning rate: 1.414E-05 | global batch size: 32 | lm loss: 6.552182E+00 | loss scale: 16384.0 | grad norm: 67542.628 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3073/ 159576 | consumed samples: 51088 | elapsed time per iteration (ms): 15069.2 | learning rate: 1.415E-05 | global batch size: 32 | lm loss: 6.449011E+00 | loss scale: 16384.0 | grad norm: 113973.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3074/ 159576 | consumed samples: 51120 | elapsed time per iteration (ms): 14473.5 | learning rate: 1.416E-05 | global batch size: 32 | lm loss: 6.462796E+00 | loss scale: 16384.0 | grad norm: 99530.753 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3075/ 159576 | consumed samples: 51152 | elapsed time per iteration (ms): 14578.5 | learning rate: 1.417E-05 | global batch size: 32 | lm loss: 6.605415E+00 | loss scale: 16384.0 | grad norm: 79580.590 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3076/ 159576 | consumed samples: 51184 | elapsed time per iteration (ms): 14526.0 | learning rate: 1.418E-05 | global batch size: 32 | lm loss: 6.643724E+00 | loss scale: 16384.0 | grad norm: 83910.537 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3077/ 159576 | consumed samples: 51216 | elapsed time per iteration (ms): 14932.5 | learning rate: 1.419E-05 | global batch size: 32 | lm loss: 6.554170E+00 | loss scale: 16384.0 | grad norm: 41888.605 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3078/ 159576 | consumed samples: 51248 | elapsed time per iteration (ms): 14631.5 | learning rate: 1.420E-05 | global batch size: 32 | lm loss: 6.609428E+00 | loss scale: 16384.0 | grad norm: 100795.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3079/ 159576 | consumed samples: 51280 | elapsed time per iteration (ms): 14613.6 | learning rate: 1.421E-05 | global batch size: 32 | lm loss: 6.647438E+00 | loss scale: 16384.0 | grad norm: 79478.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3080/ 159576 | consumed samples: 51312 | elapsed time per iteration (ms): 14624.3 | learning rate: 1.421E-05 | global batch size: 32 | lm loss: 6.548526E+00 | loss scale: 16384.0 | grad norm: 61687.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3081/ 159576 | consumed samples: 51344 | elapsed time per iteration (ms): 14941.2 | learning rate: 1.422E-05 | global batch size: 32 | lm loss: 6.559642E+00 | loss scale: 16384.0 | grad norm: 51017.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3082/ 159576 | consumed samples: 51376 | elapsed time per iteration (ms): 14650.5 | learning rate: 1.423E-05 | global batch size: 32 | lm loss: 6.513590E+00 | loss scale: 16384.0 | grad norm: 62838.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3083/ 159576 | consumed samples: 51408 | elapsed time per iteration (ms): 14595.1 | learning rate: 1.424E-05 | global batch size: 32 | lm loss: 6.454400E+00 | loss scale: 16384.0 | grad norm: 85218.080 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3084/ 159576 | consumed samples: 51440 | elapsed time per iteration (ms): 14539.5 | learning rate: 1.425E-05 | global batch size: 32 | lm loss: 6.667971E+00 | loss scale: 16384.0 | grad norm: 74883.565 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3085/ 159576 | consumed samples: 51472 | elapsed time per iteration (ms): 14496.8 | learning rate: 1.426E-05 | global batch size: 32 | lm loss: 6.608503E+00 | loss scale: 16384.0 | grad norm: 64204.771 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3086/ 159576 | consumed samples: 51504 | elapsed time per iteration (ms): 14686.0 | learning rate: 1.427E-05 | global batch size: 32 | lm loss: 6.699879E+00 | loss scale: 16384.0 | grad norm: 42613.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 14:06:36] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 14:06:36] PULSE: tr8-104B is running for 8:14:25 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 3087/ 159576 | consumed samples: 51536 | elapsed time per iteration (ms): 14518.6 | learning rate: 1.428E-05 | global batch size: 32 | lm loss: 6.539448E+00 | loss scale: 16384.0 | grad norm: 88063.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3088/ 159576 | consumed samples: 51568 | elapsed time per iteration (ms): 14588.4 | learning rate: 1.429E-05 | global batch size: 32 | lm loss: 6.589184E+00 | loss scale: 16384.0 | grad norm: 54256.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3089/ 159576 | consumed samples: 51600 | elapsed time per iteration (ms): 14631.0 | learning rate: 1.429E-05 | global batch size: 32 | lm loss: 6.700484E+00 | loss scale: 16384.0 | grad norm: 54269.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3090/ 159576 | consumed samples: 51632 | elapsed time per iteration (ms): 14830.4 | learning rate: 1.430E-05 | global batch size: 32 | lm loss: 6.576167E+00 | loss scale: 16384.0 | grad norm: 57490.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3091/ 159576 | consumed samples: 51664 | elapsed time per iteration (ms): 14445.4 | learning rate: 1.431E-05 | global batch size: 32 | lm loss: 6.601985E+00 | loss scale: 16384.0 | grad norm: 57872.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3092/ 159576 | consumed samples: 51696 | elapsed time per iteration (ms): 14536.8 | learning rate: 1.432E-05 | global batch size: 32 | lm loss: 6.407238E+00 | loss scale: 16384.0 | grad norm: 52047.068 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3093/ 159576 | consumed samples: 51728 | elapsed time per iteration (ms): 14606.0 | learning rate: 1.433E-05 | global batch size: 32 | lm loss: 6.659007E+00 | loss scale: 16384.0 | grad norm: 76903.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3094/ 159576 | consumed samples: 51760 | elapsed time per iteration (ms): 14751.8 | learning rate: 1.434E-05 | global batch size: 32 | lm loss: 6.623207E+00 | loss scale: 16384.0 | grad norm: 98639.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3095/ 159576 | consumed samples: 51792 | elapsed time per iteration (ms): 14636.3 | learning rate: 1.435E-05 | global batch size: 32 | lm loss: 6.697064E+00 | loss scale: 16384.0 | grad norm: 59113.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3096/ 159576 | consumed samples: 51824 | elapsed time per iteration (ms): 14701.7 | learning rate: 1.436E-05 | global batch size: 32 | lm loss: 6.510694E+00 | loss scale: 16384.0 | grad norm: 57025.627 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3097/ 159576 | consumed samples: 51856 | elapsed time per iteration (ms): 14643.0 | learning rate: 1.437E-05 | global batch size: 32 | lm loss: 6.610021E+00 | loss scale: 16384.0 | grad norm: 90059.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3098/ 159576 | consumed samples: 51888 | elapsed time per iteration (ms): 14837.7 | learning rate: 1.437E-05 | global batch size: 32 | lm loss: 6.534551E+00 | loss scale: 16384.0 | grad norm: 45874.846 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3099/ 159576 | consumed samples: 51920 | elapsed time per iteration (ms): 14607.4 | learning rate: 1.438E-05 | global batch size: 32 | lm loss: 6.517954E+00 | loss scale: 16384.0 | grad norm: 60226.775 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3100/ 159576 | consumed samples: 51952 | elapsed time per iteration (ms): 14537.4 | learning rate: 1.439E-05 | global batch size: 32 | lm loss: 6.457252E+00 | loss scale: 16384.0 | grad norm: 46090.904 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3101/ 159576 | consumed samples: 51984 | elapsed time per iteration (ms): 14526.9 | learning rate: 1.440E-05 | global batch size: 32 | lm loss: 6.609892E+00 | loss scale: 16384.0 | grad norm: 94724.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3102/ 159576 | consumed samples: 52016 | elapsed time per iteration (ms): 14927.9 | learning rate: 1.441E-05 | global batch size: 32 | lm loss: 6.698421E+00 | loss scale: 16384.0 | grad norm: 87402.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3103/ 159576 | consumed samples: 52048 | elapsed time per iteration (ms): 14723.0 | learning rate: 1.442E-05 | global batch size: 32 | lm loss: 6.607485E+00 | loss scale: 16384.0 | grad norm: 53552.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3104/ 159576 | consumed samples: 52080 | elapsed time per iteration (ms): 14655.6 | learning rate: 1.443E-05 | global batch size: 32 | lm loss: 6.771776E+00 | loss scale: 16384.0 | grad norm: 77470.084 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3105/ 159576 | consumed samples: 52112 | elapsed time per iteration (ms): 14632.7 | learning rate: 1.444E-05 | global batch size: 32 | lm loss: 6.573309E+00 | loss scale: 16384.0 | grad norm: 60932.025 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3106/ 159576 | consumed samples: 52144 | elapsed time per iteration (ms): 15115.7 | learning rate: 1.445E-05 | global batch size: 32 | lm loss: 6.610741E+00 | loss scale: 16384.0 | grad norm: 67949.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3107/ 159576 | consumed samples: 52176 | elapsed time per iteration (ms): 14559.3 | learning rate: 1.445E-05 | global batch size: 32 | lm loss: 6.538753E+00 | loss scale: 16384.0 | grad norm: 71734.909 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3108/ 159576 | consumed samples: 52208 | elapsed time per iteration (ms): 14588.4 | learning rate: 1.446E-05 | global batch size: 32 | lm loss: 6.527990E+00 | loss scale: 16384.0 | grad norm: 86170.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3109/ 159576 | consumed samples: 52240 | elapsed time per iteration (ms): 14660.3 | learning rate: 1.447E-05 | global batch size: 32 | lm loss: 6.556553E+00 | loss scale: 16384.0 | grad norm: 46751.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3110/ 159576 | consumed samples: 52272 | elapsed time per iteration (ms): 15046.4 | learning rate: 1.448E-05 | global batch size: 32 | lm loss: 6.566851E+00 | loss scale: 16384.0 | grad norm: 67209.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3111/ 159576 | consumed samples: 52304 | elapsed time per iteration (ms): 14570.9 | learning rate: 1.449E-05 | global batch size: 32 | lm loss: 6.635989E+00 | loss scale: 16384.0 | grad norm: 53538.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3112/ 159576 | consumed samples: 52336 | elapsed time per iteration (ms): 14664.0 | learning rate: 1.450E-05 | global batch size: 32 | lm loss: 6.739109E+00 | loss scale: 16384.0 | grad norm: 100581.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3113/ 159576 | consumed samples: 52368 | elapsed time per iteration (ms): 14690.0 | learning rate: 1.451E-05 | global batch size: 32 | lm loss: 6.534431E+00 | loss scale: 16384.0 | grad norm: 69366.573 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3114/ 159576 | consumed samples: 52400 | elapsed time per iteration (ms): 14854.6 | learning rate: 1.452E-05 | global batch size: 32 | lm loss: 6.481595E+00 | loss scale: 16384.0 | grad norm: 57933.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3115/ 159576 | consumed samples: 52432 | elapsed time per iteration (ms): 14581.0 | learning rate: 1.453E-05 | global batch size: 32 | lm loss: 6.466241E+00 | loss scale: 16384.0 | grad norm: 91764.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3116/ 159576 | consumed samples: 52464 | elapsed time per iteration (ms): 14603.8 | learning rate: 1.453E-05 | global batch size: 32 | lm loss: 6.818060E+00 | loss scale: 16384.0 | grad norm: 73322.908 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3117/ 159576 | consumed samples: 52496 | elapsed time per iteration (ms): 14655.4 | learning rate: 1.454E-05 | global batch size: 32 | lm loss: 6.541664E+00 | loss scale: 16384.0 | grad norm: 79876.153 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3118/ 159576 | consumed samples: 52528 | elapsed time per iteration (ms): 15059.6 | learning rate: 1.455E-05 | global batch size: 32 | lm loss: 6.582567E+00 | loss scale: 16384.0 | grad norm: 57737.032 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3119/ 159576 | consumed samples: 52560 | elapsed time per iteration (ms): 14561.2 | learning rate: 1.456E-05 | global batch size: 32 | lm loss: 6.616435E+00 | loss scale: 16384.0 | grad norm: 75078.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3120/ 159576 | consumed samples: 52592 | elapsed time per iteration (ms): 14627.9 | learning rate: 1.457E-05 | global batch size: 32 | lm loss: 6.688129E+00 | loss scale: 16384.0 | grad norm: 51450.549 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3121/ 159576 | consumed samples: 52624 | elapsed time per iteration (ms): 14579.2 | learning rate: 1.458E-05 | global batch size: 32 | lm loss: 6.456697E+00 | loss scale: 16384.0 | grad norm: 69973.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3122/ 159576 | consumed samples: 52656 | elapsed time per iteration (ms): 15025.4 | learning rate: 1.459E-05 | global batch size: 32 | lm loss: 6.629485E+00 | loss scale: 16384.0 | grad norm: 57268.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3123/ 159576 | consumed samples: 52688 | elapsed time per iteration (ms): 14578.8 | learning rate: 1.460E-05 | global batch size: 32 | lm loss: 6.404414E+00 | loss scale: 16384.0 | grad norm: 63882.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3124/ 159576 | consumed samples: 52720 | elapsed time per iteration (ms): 14582.6 | learning rate: 1.461E-05 | global batch size: 32 | lm loss: 6.473093E+00 | loss scale: 16384.0 | grad norm: 50308.716 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3125/ 159576 | consumed samples: 52752 | elapsed time per iteration (ms): 14640.7 | learning rate: 1.461E-05 | global batch size: 32 | lm loss: 6.497868E+00 | loss scale: 16384.0 | grad norm: 63650.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3126/ 159576 | consumed samples: 52784 | elapsed time per iteration (ms): 15046.6 | learning rate: 1.462E-05 | global batch size: 32 | lm loss: 6.549313E+00 | loss scale: 16384.0 | grad norm: 72289.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3127/ 159576 | consumed samples: 52816 | elapsed time per iteration (ms): 14723.2 | learning rate: 1.463E-05 | global batch size: 32 | lm loss: 6.590129E+00 | loss scale: 16384.0 | grad norm: 47547.789 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3128/ 159576 | consumed samples: 52848 | elapsed time per iteration (ms): 14552.7 | learning rate: 1.464E-05 | global batch size: 32 | lm loss: 6.731832E+00 | loss scale: 16384.0 | grad norm: 68103.769 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3129/ 159576 | consumed samples: 52880 | elapsed time per iteration (ms): 14573.2 | learning rate: 1.465E-05 | global batch size: 32 | lm loss: 6.528438E+00 | loss scale: 16384.0 | grad norm: 57671.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3130/ 159576 | consumed samples: 52912 | elapsed time per iteration (ms): 14663.9 | learning rate: 1.466E-05 | global batch size: 32 | lm loss: 6.672345E+00 | loss scale: 16384.0 | grad norm: 42986.119 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3131/ 159576 | consumed samples: 52944 | elapsed time per iteration (ms): 14852.7 | learning rate: 1.467E-05 | global batch size: 32 | lm loss: 6.489813E+00 | loss scale: 16384.0 | grad norm: 54642.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3132/ 159576 | consumed samples: 52976 | elapsed time per iteration (ms): 14644.1 | learning rate: 1.468E-05 | global batch size: 32 | lm loss: 6.597792E+00 | loss scale: 16384.0 | grad norm: 52604.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3133/ 159576 | consumed samples: 53008 | elapsed time per iteration (ms): 14641.3 | learning rate: 1.468E-05 | global batch size: 32 | lm loss: 6.527011E+00 | loss scale: 16384.0 | grad norm: 59630.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3134/ 159576 | consumed samples: 53040 | elapsed time per iteration (ms): 14626.4 | learning rate: 1.469E-05 | global batch size: 32 | lm loss: 6.581876E+00 | loss scale: 16384.0 | grad norm: 57219.019 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3135/ 159576 | consumed samples: 53072 | elapsed time per iteration (ms): 14774.4 | learning rate: 1.470E-05 | global batch size: 32 | lm loss: 6.708944E+00 | loss scale: 16384.0 | grad norm: 55756.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3136/ 159576 | consumed samples: 53104 | elapsed time per iteration (ms): 14618.5 | learning rate: 1.471E-05 | global batch size: 32 | lm loss: 6.679635E+00 | loss scale: 16384.0 | grad norm: 42400.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3137/ 159576 | consumed samples: 53136 | elapsed time per iteration (ms): 14614.4 | learning rate: 1.472E-05 | global batch size: 32 | lm loss: 6.469272E+00 | loss scale: 16384.0 | grad norm: 142351.991 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3138/ 159576 | consumed samples: 53168 | elapsed time per iteration (ms): 14596.5 | learning rate: 1.473E-05 | global batch size: 32 | lm loss: 6.554899E+00 | loss scale: 16384.0 | grad norm: 98568.745 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3139/ 159576 | consumed samples: 53200 | elapsed time per iteration (ms): 14719.6 | learning rate: 1.474E-05 | global batch size: 32 | lm loss: 6.618309E+00 | loss scale: 16384.0 | grad norm: 73504.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3140/ 159576 | consumed samples: 53232 | elapsed time per iteration (ms): 14627.2 | learning rate: 1.475E-05 | global batch size: 32 | lm loss: 6.588873E+00 | loss scale: 16384.0 | grad norm: 73534.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3141/ 159576 | consumed samples: 53264 | elapsed time per iteration (ms): 14634.4 | learning rate: 1.476E-05 | global batch size: 32 | lm loss: 6.357007E+00 | loss scale: 16384.0 | grad norm: 84712.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3142/ 159576 | consumed samples: 53296 | elapsed time per iteration (ms): 14717.8 | learning rate: 1.476E-05 | global batch size: 32 | lm loss: 6.623076E+00 | loss scale: 16384.0 | grad norm: 94140.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3143/ 159576 | consumed samples: 53328 | elapsed time per iteration (ms): 14697.5 | learning rate: 1.477E-05 | global batch size: 32 | lm loss: 6.562120E+00 | loss scale: 16384.0 | grad norm: 60657.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3144/ 159576 | consumed samples: 53360 | elapsed time per iteration (ms): 14578.1 | learning rate: 1.478E-05 | global batch size: 32 | lm loss: 6.445246E+00 | loss scale: 16384.0 | grad norm: 61798.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3145/ 159576 | consumed samples: 53392 | elapsed time per iteration (ms): 14616.8 | learning rate: 1.479E-05 | global batch size: 32 | lm loss: 6.440137E+00 | loss scale: 16384.0 | grad norm: 72537.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3146/ 159576 | consumed samples: 53424 | elapsed time per iteration (ms): 14619.6 | learning rate: 1.480E-05 | global batch size: 32 | lm loss: 6.739626E+00 | loss scale: 16384.0 | grad norm: 53372.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3147/ 159576 | consumed samples: 53456 | elapsed time per iteration (ms): 14895.9 | learning rate: 1.481E-05 | global batch size: 32 | lm loss: 6.588343E+00 | loss scale: 16384.0 | grad norm: 132102.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3148/ 159576 | consumed samples: 53488 | elapsed time per iteration (ms): 14681.1 | learning rate: 1.482E-05 | global batch size: 32 | lm loss: 6.551591E+00 | loss scale: 16384.0 | grad norm: 58550.161 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3149/ 159576 | consumed samples: 53520 | elapsed time per iteration (ms): 14682.3 | learning rate: 1.483E-05 | global batch size: 32 | lm loss: 6.632958E+00 | loss scale: 16384.0 | grad norm: 77007.903 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3150/ 159576 | consumed samples: 53552 | elapsed time per iteration (ms): 14624.1 | learning rate: 1.484E-05 | global batch size: 32 | lm loss: 6.648820E+00 | loss scale: 16384.0 | grad norm: 86896.917 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3151/ 159576 | consumed samples: 53584 | elapsed time per iteration (ms): 14845.8 | learning rate: 1.484E-05 | global batch size: 32 | lm loss: 6.446036E+00 | loss scale: 16384.0 | grad norm: 89979.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3152/ 159576 | consumed samples: 53616 | elapsed time per iteration (ms): 14727.8 | learning rate: 1.485E-05 | global batch size: 32 | lm loss: 6.617037E+00 | loss scale: 16384.0 | grad norm: 58488.767 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3153/ 159576 | consumed samples: 53648 | elapsed time per iteration (ms): 14649.7 | learning rate: 1.486E-05 | global batch size: 32 | lm loss: 6.529748E+00 | loss scale: 16384.0 | grad norm: 74833.007 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3154/ 159576 | consumed samples: 53680 | elapsed time per iteration (ms): 14647.6 | learning rate: 1.487E-05 | global batch size: 32 | lm loss: 6.562946E+00 | loss scale: 16384.0 | grad norm: 52935.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3155/ 159576 | consumed samples: 53712 | elapsed time per iteration (ms): 15107.7 | learning rate: 1.488E-05 | global batch size: 32 | lm loss: 6.514643E+00 | loss scale: 16384.0 | grad norm: 115570.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3156/ 159576 | consumed samples: 53744 | elapsed time per iteration (ms): 14720.1 | learning rate: 1.489E-05 | global batch size: 32 | lm loss: 6.684644E+00 | loss scale: 16384.0 | grad norm: 80957.169 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3157/ 159576 | consumed samples: 53776 | elapsed time per iteration (ms): 14692.8 | learning rate: 1.490E-05 | global batch size: 32 | lm loss: 6.519046E+00 | loss scale: 16384.0 | grad norm: 55678.918 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3158/ 159576 | consumed samples: 53808 | elapsed time per iteration (ms): 14686.5 | learning rate: 1.491E-05 | global batch size: 32 | lm loss: 6.746099E+00 | loss scale: 16384.0 | grad norm: 90492.004 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3159/ 159576 | consumed samples: 53840 | elapsed time per iteration (ms): 15011.1 | learning rate: 1.492E-05 | global batch size: 32 | lm loss: 6.536778E+00 | loss scale: 16384.0 | grad norm: 71520.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3160/ 159576 | consumed samples: 53872 | elapsed time per iteration (ms): 14579.4 | learning rate: 1.492E-05 | global batch size: 32 | lm loss: 6.666056E+00 | loss scale: 16384.0 | grad norm: 84616.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3161/ 159576 | consumed samples: 53904 | elapsed time per iteration (ms): 14644.1 | learning rate: 1.493E-05 | global batch size: 32 | lm loss: 6.597644E+00 | loss scale: 16384.0 | grad norm: 75093.664 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3162/ 159576 | consumed samples: 53936 | elapsed time per iteration (ms): 14697.1 | learning rate: 1.494E-05 | global batch size: 32 | lm loss: 6.446161E+00 | loss scale: 16384.0 | grad norm: 65649.952 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3163/ 159576 | consumed samples: 53968 | elapsed time per iteration (ms): 14947.2 | learning rate: 1.495E-05 | global batch size: 32 | lm loss: 6.681765E+00 | loss scale: 16384.0 | grad norm: 60219.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3164/ 159576 | consumed samples: 54000 | elapsed time per iteration (ms): 14663.4 | learning rate: 1.496E-05 | global batch size: 32 | lm loss: 6.525707E+00 | loss scale: 16384.0 | grad norm: 68154.761 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3165/ 159576 | consumed samples: 54032 | elapsed time per iteration (ms): 14769.3 | learning rate: 1.497E-05 | global batch size: 32 | lm loss: 6.587021E+00 | loss scale: 16384.0 | grad norm: 78180.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3166/ 159576 | consumed samples: 54064 | elapsed time per iteration (ms): 14610.2 | learning rate: 1.498E-05 | global batch size: 32 | lm loss: 6.519161E+00 | loss scale: 16384.0 | grad norm: 61912.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3167/ 159576 | consumed samples: 54096 | elapsed time per iteration (ms): 14999.0 | learning rate: 1.499E-05 | global batch size: 32 | lm loss: 6.632318E+00 | loss scale: 16384.0 | grad norm: 108253.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3168/ 159576 | consumed samples: 54128 | elapsed time per iteration (ms): 14650.1 | learning rate: 1.500E-05 | global batch size: 32 | lm loss: 6.465475E+00 | loss scale: 16384.0 | grad norm: 62950.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3169/ 159576 | consumed samples: 54160 | elapsed time per iteration (ms): 14661.3 | learning rate: 1.500E-05 | global batch size: 32 | lm loss: 6.539711E+00 | loss scale: 16384.0 | grad norm: 92615.638 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3170/ 159576 | consumed samples: 54192 | elapsed time per iteration (ms): 14674.1 | learning rate: 1.501E-05 | global batch size: 32 | lm loss: 6.579189E+00 | loss scale: 16384.0 | grad norm: 83785.863 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3171/ 159576 | consumed samples: 54224 | elapsed time per iteration (ms): 15070.8 | learning rate: 1.502E-05 | global batch size: 32 | lm loss: 6.793476E+00 | loss scale: 16384.0 | grad norm: 62540.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3172/ 159576 | consumed samples: 54256 | elapsed time per iteration (ms): 14666.7 | learning rate: 1.503E-05 | global batch size: 32 | lm loss: 6.584558E+00 | loss scale: 16384.0 | grad norm: 112108.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3173/ 159576 | consumed samples: 54288 | elapsed time per iteration (ms): 14625.8 | learning rate: 1.504E-05 | global batch size: 32 | lm loss: 6.600308E+00 | loss scale: 16384.0 | grad norm: 74654.549 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3174/ 159576 | consumed samples: 54320 | elapsed time per iteration (ms): 14636.6 | learning rate: 1.505E-05 | global batch size: 32 | lm loss: 6.586472E+00 | loss scale: 16384.0 | grad norm: 64570.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3175/ 159576 | consumed samples: 54352 | elapsed time per iteration (ms): 15097.6 | learning rate: 1.506E-05 | global batch size: 32 | lm loss: 6.611074E+00 | loss scale: 16384.0 | grad norm: 67988.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3176/ 159576 | consumed samples: 54384 | elapsed time per iteration (ms): 14507.7 | learning rate: 1.507E-05 | global batch size: 32 | lm loss: 6.524911E+00 | loss scale: 16384.0 | grad norm: 52695.097 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3177/ 159576 | consumed samples: 54416 | elapsed time per iteration (ms): 14667.9 | learning rate: 1.508E-05 | global batch size: 32 | lm loss: 6.622879E+00 | loss scale: 16384.0 | grad norm: 96311.880 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3178/ 159576 | consumed samples: 54448 | elapsed time per iteration (ms): 14717.9 | learning rate: 1.508E-05 | global batch size: 32 | lm loss: 6.557679E+00 | loss scale: 16384.0 | grad norm: 75112.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3179/ 159576 | consumed samples: 54480 | elapsed time per iteration (ms): 15028.6 | learning rate: 1.509E-05 | global batch size: 32 | lm loss: 6.508760E+00 | loss scale: 16384.0 | grad norm: 67929.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3180/ 159576 | consumed samples: 54512 | elapsed time per iteration (ms): 14774.6 | learning rate: 1.510E-05 | global batch size: 32 | lm loss: 6.573524E+00 | loss scale: 16384.0 | grad norm: 76526.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3181/ 159576 | consumed samples: 54544 | elapsed time per iteration (ms): 14648.5 | learning rate: 1.511E-05 | global batch size: 32 | lm loss: 6.629518E+00 | loss scale: 16384.0 | grad norm: 51441.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3182/ 159576 | consumed samples: 54576 | elapsed time per iteration (ms): 14620.2 | learning rate: 1.512E-05 | global batch size: 32 | lm loss: 6.528477E+00 | loss scale: 16384.0 | grad norm: 84031.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3183/ 159576 | consumed samples: 54608 | elapsed time per iteration (ms): 14671.0 | learning rate: 1.513E-05 | global batch size: 32 | lm loss: 6.450350E+00 | loss scale: 16384.0 | grad norm: 47787.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3184/ 159576 | consumed samples: 54640 | elapsed time per iteration (ms): 14835.3 | learning rate: 1.514E-05 | global batch size: 32 | lm loss: 6.547495E+00 | loss scale: 16384.0 | grad norm: 57635.062 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3185/ 159576 | consumed samples: 54672 | elapsed time per iteration (ms): 14691.4 | learning rate: 1.515E-05 | global batch size: 32 | lm loss: 6.438165E+00 | loss scale: 16384.0 | grad norm: 59205.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3186/ 159576 | consumed samples: 54704 | elapsed time per iteration (ms): 14599.9 | learning rate: 1.516E-05 | global batch size: 32 | lm loss: 6.543282E+00 | loss scale: 16384.0 | grad norm: 56916.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3187/ 159576 | consumed samples: 54736 | elapsed time per iteration (ms): 14594.3 | learning rate: 1.516E-05 | global batch size: 32 | lm loss: 6.619707E+00 | loss scale: 16384.0 | grad norm: 87429.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3188/ 159576 | consumed samples: 54768 | elapsed time per iteration (ms): 14717.0 | learning rate: 1.517E-05 | global batch size: 32 | lm loss: 6.575029E+00 | loss scale: 16384.0 | grad norm: 63063.172 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3189/ 159576 | consumed samples: 54800 | elapsed time per iteration (ms): 14535.7 | learning rate: 1.518E-05 | global batch size: 32 | lm loss: 6.572168E+00 | loss scale: 16384.0 | grad norm: 85759.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3190/ 159576 | consumed samples: 54832 | elapsed time per iteration (ms): 14535.8 | learning rate: 1.519E-05 | global batch size: 32 | lm loss: 6.540303E+00 | loss scale: 16384.0 | grad norm: 59464.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3191/ 159576 | consumed samples: 54864 | elapsed time per iteration (ms): 14477.2 | learning rate: 1.520E-05 | global batch size: 32 | lm loss: 6.545095E+00 | loss scale: 16384.0 | grad norm: 53870.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3192/ 159576 | consumed samples: 54896 | elapsed time per iteration (ms): 14651.8 | learning rate: 1.521E-05 | global batch size: 32 | lm loss: 6.497169E+00 | loss scale: 16384.0 | grad norm: 50516.018 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3193/ 159576 | consumed samples: 54928 | elapsed time per iteration (ms): 14555.7 | learning rate: 1.522E-05 | global batch size: 32 | lm loss: 6.354692E+00 | loss scale: 16384.0 | grad norm: 67216.716 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3194/ 159576 | consumed samples: 54960 | elapsed time per iteration (ms): 14548.6 | learning rate: 1.523E-05 | global batch size: 32 | lm loss: 6.704625E+00 | loss scale: 16384.0 | grad norm: 64544.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3195/ 159576 | consumed samples: 54992 | elapsed time per iteration (ms): 14549.1 | learning rate: 1.524E-05 | global batch size: 32 | lm loss: 6.489696E+00 | loss scale: 16384.0 | grad norm: 43746.021 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3196/ 159576 | consumed samples: 55024 | elapsed time per iteration (ms): 14783.1 | learning rate: 1.524E-05 | global batch size: 32 | lm loss: 6.496898E+00 | loss scale: 16384.0 | grad norm: 146573.122 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3197/ 159576 | consumed samples: 55056 | elapsed time per iteration (ms): 14527.9 | learning rate: 1.525E-05 | global batch size: 32 | lm loss: 6.568567E+00 | loss scale: 16384.0 | grad norm: 78804.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3198/ 159576 | consumed samples: 55088 | elapsed time per iteration (ms): 14523.2 | learning rate: 1.526E-05 | global batch size: 32 | lm loss: 6.598960E+00 | loss scale: 16384.0 | grad norm: 96783.060 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3199/ 159576 | consumed samples: 55120 | elapsed time per iteration (ms): 14540.7 | learning rate: 1.527E-05 | global batch size: 32 | lm loss: 6.572606E+00 | loss scale: 16384.0 | grad norm: 89417.690 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3200/ 159576 | consumed samples: 55152 | elapsed time per iteration (ms): 15008.9 | learning rate: 1.528E-05 | global batch size: 32 | lm loss: 6.506562E+00 | loss scale: 16384.0 | grad norm: 41993.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3201/ 159576 | consumed samples: 55184 | elapsed time per iteration (ms): 14658.0 | learning rate: 1.529E-05 | global batch size: 32 | lm loss: 6.782739E+00 | loss scale: 16384.0 | grad norm: 352113.189 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3202/ 159576 | consumed samples: 55216 | elapsed time per iteration (ms): 14567.2 | learning rate: 1.530E-05 | global batch size: 32 | lm loss: 6.567737E+00 | loss scale: 16384.0 | grad norm: 255563.854 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3203/ 159576 | consumed samples: 55248 | elapsed time per iteration (ms): 14521.2 | learning rate: 1.531E-05 | global batch size: 32 | lm loss: 6.758952E+00 | loss scale: 16384.0 | grad norm: 132639.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3204/ 159576 | consumed samples: 55280 | elapsed time per iteration (ms): 15057.0 | learning rate: 1.532E-05 | global batch size: 32 | lm loss: 6.644050E+00 | loss scale: 16384.0 | grad norm: 95206.523 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3205/ 159576 | consumed samples: 55312 | elapsed time per iteration (ms): 14632.3 | learning rate: 1.532E-05 | global batch size: 32 | lm loss: 6.559070E+00 | loss scale: 16384.0 | grad norm: 92448.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3206/ 159576 | consumed samples: 55344 | elapsed time per iteration (ms): 14560.7 | learning rate: 1.533E-05 | global batch size: 32 | lm loss: 6.544364E+00 | loss scale: 16384.0 | grad norm: 87185.641 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3207/ 159576 | consumed samples: 55376 | elapsed time per iteration (ms): 14559.6 | learning rate: 1.534E-05 | global batch size: 32 | lm loss: 6.617725E+00 | loss scale: 16384.0 | grad norm: 147534.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3208/ 159576 | consumed samples: 55408 | elapsed time per iteration (ms): 14919.1 | learning rate: 1.535E-05 | global batch size: 32 | lm loss: 6.505226E+00 | loss scale: 16384.0 | grad norm: 82317.664 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3209/ 159576 | consumed samples: 55440 | elapsed time per iteration (ms): 14628.9 | learning rate: 1.536E-05 | global batch size: 32 | lm loss: 6.529959E+00 | loss scale: 16384.0 | grad norm: 62063.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3210/ 159576 | consumed samples: 55472 | elapsed time per iteration (ms): 14562.8 | learning rate: 1.537E-05 | global batch size: 32 | lm loss: 6.499523E+00 | loss scale: 16384.0 | grad norm: 59027.974 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3211/ 159576 | consumed samples: 55504 | elapsed time per iteration (ms): 14551.3 | learning rate: 1.538E-05 | global batch size: 32 | lm loss: 6.612097E+00 | loss scale: 16384.0 | grad norm: 142076.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3212/ 159576 | consumed samples: 55536 | elapsed time per iteration (ms): 14906.9 | learning rate: 1.539E-05 | global batch size: 32 | lm loss: 6.726549E+00 | loss scale: 16384.0 | grad norm: 85971.039 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3213/ 159576 | consumed samples: 55568 | elapsed time per iteration (ms): 14484.4 | learning rate: 1.539E-05 | global batch size: 32 | lm loss: 6.627134E+00 | loss scale: 16384.0 | grad norm: 74784.069 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3214/ 159576 | consumed samples: 55600 | elapsed time per iteration (ms): 14568.5 | learning rate: 1.540E-05 | global batch size: 32 | lm loss: 6.684568E+00 | loss scale: 16384.0 | grad norm: 85537.156 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3215/ 159576 | consumed samples: 55632 | elapsed time per iteration (ms): 14541.7 | learning rate: 1.541E-05 | global batch size: 32 | lm loss: 6.632449E+00 | loss scale: 16384.0 | grad norm: 118554.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3216/ 159576 | consumed samples: 55664 | elapsed time per iteration (ms): 14903.9 | learning rate: 1.542E-05 | global batch size: 32 | lm loss: 6.491426E+00 | loss scale: 16384.0 | grad norm: 66361.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3217/ 159576 | consumed samples: 55696 | elapsed time per iteration (ms): 14654.1 | learning rate: 1.543E-05 | global batch size: 32 | lm loss: 6.599683E+00 | loss scale: 16384.0 | grad norm: 66284.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3218/ 159576 | consumed samples: 55728 | elapsed time per iteration (ms): 14564.4 | learning rate: 1.544E-05 | global batch size: 32 | lm loss: 6.671634E+00 | loss scale: 16384.0 | grad norm: 48626.750 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3219/ 159576 | consumed samples: 55760 | elapsed time per iteration (ms): 14567.8 | learning rate: 1.545E-05 | global batch size: 32 | lm loss: 6.653804E+00 | loss scale: 16384.0 | grad norm: 84407.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3220/ 159576 | consumed samples: 55792 | elapsed time per iteration (ms): 14939.3 | learning rate: 1.546E-05 | global batch size: 32 | lm loss: 6.519379E+00 | loss scale: 16384.0 | grad norm: 72885.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3221/ 159576 | consumed samples: 55824 | elapsed time per iteration (ms): 14579.8 | learning rate: 1.547E-05 | global batch size: 32 | lm loss: 6.658468E+00 | loss scale: 16384.0 | grad norm: 69063.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3222/ 159576 | consumed samples: 55856 | elapsed time per iteration (ms): 14568.3 | learning rate: 1.547E-05 | global batch size: 32 | lm loss: 6.544227E+00 | loss scale: 16384.0 | grad norm: 94167.013 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3223/ 159576 | consumed samples: 55888 | elapsed time per iteration (ms): 14530.3 | learning rate: 1.548E-05 | global batch size: 32 | lm loss: 6.519998E+00 | loss scale: 16384.0 | grad norm: 74630.691 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3224/ 159576 | consumed samples: 55920 | elapsed time per iteration (ms): 14849.7 | learning rate: 1.549E-05 | global batch size: 32 | lm loss: 6.586551E+00 | loss scale: 16384.0 | grad norm: 76630.181 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3225/ 159576 | consumed samples: 55952 | elapsed time per iteration (ms): 14888.8 | learning rate: 1.550E-05 | global batch size: 32 | lm loss: 6.687891E+00 | loss scale: 16384.0 | grad norm: 70630.932 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3226/ 159576 | consumed samples: 55984 | elapsed time per iteration (ms): 14540.3 | learning rate: 1.551E-05 | global batch size: 32 | lm loss: 6.595382E+00 | loss scale: 16384.0 | grad norm: 92178.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3227/ 159576 | consumed samples: 56016 | elapsed time per iteration (ms): 14557.7 | learning rate: 1.552E-05 | global batch size: 32 | lm loss: 6.364616E+00 | loss scale: 16384.0 | grad norm: 62395.737 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3228/ 159576 | consumed samples: 56048 | elapsed time per iteration (ms): 14547.2 | learning rate: 1.553E-05 | global batch size: 32 | lm loss: 6.614971E+00 | loss scale: 16384.0 | grad norm: 72348.132 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3229/ 159576 | consumed samples: 56080 | elapsed time per iteration (ms): 14765.8 | learning rate: 1.554E-05 | global batch size: 32 | lm loss: 6.527470E+00 | loss scale: 16384.0 | grad norm: 70068.847 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3230/ 159576 | consumed samples: 56112 | elapsed time per iteration (ms): 14547.7 | learning rate: 1.555E-05 | global batch size: 32 | lm loss: 6.691795E+00 | loss scale: 16384.0 | grad norm: 79540.792 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3231/ 159576 | consumed samples: 56144 | elapsed time per iteration (ms): 14659.9 | learning rate: 1.555E-05 | global batch size: 32 | lm loss: 6.541613E+00 | loss scale: 16384.0 | grad norm: 49841.975 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3232/ 159576 | consumed samples: 56176 | elapsed time per iteration (ms): 14501.9 | learning rate: 1.556E-05 | global batch size: 32 | lm loss: 6.634310E+00 | loss scale: 16384.0 | grad norm: 67541.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3233/ 159576 | consumed samples: 56208 | elapsed time per iteration (ms): 14751.5 | learning rate: 1.557E-05 | global batch size: 32 | lm loss: 6.538262E+00 | loss scale: 16384.0 | grad norm: 60234.071 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3234/ 159576 | consumed samples: 56240 | elapsed time per iteration (ms): 14540.9 | learning rate: 1.558E-05 | global batch size: 32 | lm loss: 6.572741E+00 | loss scale: 16384.0 | grad norm: 51996.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3235/ 159576 | consumed samples: 56272 | elapsed time per iteration (ms): 14525.6 | learning rate: 1.559E-05 | global batch size: 32 | lm loss: 6.514688E+00 | loss scale: 16384.0 | grad norm: 80129.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3236/ 159576 | consumed samples: 56304 | elapsed time per iteration (ms): 14525.2 | learning rate: 1.560E-05 | global batch size: 32 | lm loss: 6.597489E+00 | loss scale: 16384.0 | grad norm: 106848.471 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3237/ 159576 | consumed samples: 56336 | elapsed time per iteration (ms): 14776.9 | learning rate: 1.561E-05 | global batch size: 32 | lm loss: 6.556981E+00 | loss scale: 16384.0 | grad norm: 71439.752 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3238/ 159576 | consumed samples: 56368 | elapsed time per iteration (ms): 14561.5 | learning rate: 1.562E-05 | global batch size: 32 | lm loss: 6.569613E+00 | loss scale: 16384.0 | grad norm: 70525.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3239/ 159576 | consumed samples: 56400 | elapsed time per iteration (ms): 14478.4 | learning rate: 1.563E-05 | global batch size: 32 | lm loss: 6.541091E+00 | loss scale: 16384.0 | grad norm: 47017.630 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3240/ 159576 | consumed samples: 56432 | elapsed time per iteration (ms): 14587.1 | learning rate: 1.563E-05 | global batch size: 32 | lm loss: 6.697134E+00 | loss scale: 16384.0 | grad norm: 53866.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3241/ 159576 | consumed samples: 56464 | elapsed time per iteration (ms): 14901.2 | learning rate: 1.564E-05 | global batch size: 32 | lm loss: 6.463998E+00 | loss scale: 16384.0 | grad norm: 72517.961 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3242/ 159576 | consumed samples: 56496 | elapsed time per iteration (ms): 14602.2 | learning rate: 1.565E-05 | global batch size: 32 | lm loss: 6.557918E+00 | loss scale: 16384.0 | grad norm: 51986.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3243/ 159576 | consumed samples: 56528 | elapsed time per iteration (ms): 14553.6 | learning rate: 1.566E-05 | global batch size: 32 | lm loss: 6.491773E+00 | loss scale: 16384.0 | grad norm: 68222.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3244/ 159576 | consumed samples: 56560 | elapsed time per iteration (ms): 14559.7 | learning rate: 1.567E-05 | global batch size: 32 | lm loss: 6.590208E+00 | loss scale: 16384.0 | grad norm: 72691.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3245/ 159576 | consumed samples: 56592 | elapsed time per iteration (ms): 14894.6 | learning rate: 1.568E-05 | global batch size: 32 | lm loss: 6.551069E+00 | loss scale: 16384.0 | grad norm: 71227.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3246/ 159576 | consumed samples: 56624 | elapsed time per iteration (ms): 14706.4 | learning rate: 1.569E-05 | global batch size: 32 | lm loss: 6.536276E+00 | loss scale: 16384.0 | grad norm: 77853.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3247/ 159576 | consumed samples: 56656 | elapsed time per iteration (ms): 14557.1 | learning rate: 1.570E-05 | global batch size: 32 | lm loss: 6.547366E+00 | loss scale: 16384.0 | grad norm: 91853.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3248/ 159576 | consumed samples: 56688 | elapsed time per iteration (ms): 14512.9 | learning rate: 1.571E-05 | global batch size: 32 | lm loss: 6.604490E+00 | loss scale: 16384.0 | grad norm: 61725.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3249/ 159576 | consumed samples: 56720 | elapsed time per iteration (ms): 14949.1 | learning rate: 1.571E-05 | global batch size: 32 | lm loss: 6.555557E+00 | loss scale: 16384.0 | grad norm: 55414.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3250/ 159576 | consumed samples: 56752 | elapsed time per iteration (ms): 14468.6 | learning rate: 1.572E-05 | global batch size: 32 | lm loss: 6.471034E+00 | loss scale: 16384.0 | grad norm: 39264.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3251/ 159576 | consumed samples: 56784 | elapsed time per iteration (ms): 14601.9 | learning rate: 1.573E-05 | global batch size: 32 | lm loss: 6.472137E+00 | loss scale: 16384.0 | grad norm: 51720.854 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3252/ 159576 | consumed samples: 56816 | elapsed time per iteration (ms): 14481.3 | learning rate: 1.574E-05 | global batch size: 32 | lm loss: 6.564797E+00 | loss scale: 16384.0 | grad norm: 55129.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3253/ 159576 | consumed samples: 56848 | elapsed time per iteration (ms): 14865.7 | learning rate: 1.575E-05 | global batch size: 32 | lm loss: 6.433147E+00 | loss scale: 16384.0 | grad norm: 48761.095 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3254/ 159576 | consumed samples: 56880 | elapsed time per iteration (ms): 14607.7 | learning rate: 1.576E-05 | global batch size: 32 | lm loss: 6.486347E+00 | loss scale: 16384.0 | grad norm: 51447.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3255/ 159576 | consumed samples: 56912 | elapsed time per iteration (ms): 14476.2 | learning rate: 1.577E-05 | global batch size: 32 | lm loss: 6.670080E+00 | loss scale: 16384.0 | grad norm: 49692.027 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3256/ 159576 | consumed samples: 56944 | elapsed time per iteration (ms): 14532.2 | learning rate: 1.578E-05 | global batch size: 32 | lm loss: 6.449496E+00 | loss scale: 16384.0 | grad norm: 46597.035 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3257/ 159576 | consumed samples: 56976 | elapsed time per iteration (ms): 14907.4 | learning rate: 1.579E-05 | global batch size: 32 | lm loss: 6.651023E+00 | loss scale: 16384.0 | grad norm: 50509.142 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3258/ 159576 | consumed samples: 57008 | elapsed time per iteration (ms): 14521.0 | learning rate: 1.579E-05 | global batch size: 32 | lm loss: 6.557060E+00 | loss scale: 16384.0 | grad norm: 46431.742 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3259/ 159576 | consumed samples: 57040 | elapsed time per iteration (ms): 14527.8 | learning rate: 1.580E-05 | global batch size: 32 | lm loss: 6.802115E+00 | loss scale: 16384.0 | grad norm: 46019.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3260/ 159576 | consumed samples: 57072 | elapsed time per iteration (ms): 14560.3 | learning rate: 1.581E-05 | global batch size: 32 | lm loss: 6.480462E+00 | loss scale: 16384.0 | grad norm: 54023.847 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3261/ 159576 | consumed samples: 57104 | elapsed time per iteration (ms): 14898.0 | learning rate: 1.582E-05 | global batch size: 32 | lm loss: 6.696016E+00 | loss scale: 16384.0 | grad norm: 51541.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3262/ 159576 | consumed samples: 57136 | elapsed time per iteration (ms): 14574.6 | learning rate: 1.583E-05 | global batch size: 32 | lm loss: 6.633371E+00 | loss scale: 16384.0 | grad norm: 64314.799 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3263/ 159576 | consumed samples: 57168 | elapsed time per iteration (ms): 14524.2 | learning rate: 1.584E-05 | global batch size: 32 | lm loss: 6.540409E+00 | loss scale: 16384.0 | grad norm: 53098.615 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3264/ 159576 | consumed samples: 57200 | elapsed time per iteration (ms): 14557.6 | learning rate: 1.585E-05 | global batch size: 32 | lm loss: 6.376970E+00 | loss scale: 32768.0 | grad norm: 75107.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3265/ 159576 | consumed samples: 57232 | elapsed time per iteration (ms): 14784.4 | learning rate: 1.586E-05 | global batch size: 32 | lm loss: 6.602743E+00 | loss scale: 32768.0 | grad norm: 125297.978 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3266/ 159576 | consumed samples: 57264 | elapsed time per iteration (ms): 14634.8 | learning rate: 1.587E-05 | global batch size: 32 | lm loss: 6.514446E+00 | loss scale: 32768.0 | grad norm: 194672.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3267/ 159576 | consumed samples: 57296 | elapsed time per iteration (ms): 14570.9 | learning rate: 1.587E-05 | global batch size: 32 | lm loss: 6.630837E+00 | loss scale: 32768.0 | grad norm: 107205.101 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3268/ 159576 | consumed samples: 57328 | elapsed time per iteration (ms): 14454.1 | learning rate: 1.588E-05 | global batch size: 32 | lm loss: 6.541512E+00 | loss scale: 32768.0 | grad norm: 112309.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3269/ 159576 | consumed samples: 57360 | elapsed time per iteration (ms): 14551.3 | learning rate: 1.589E-05 | global batch size: 32 | lm loss: 6.542883E+00 | loss scale: 32768.0 | grad norm: 132672.039 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3270/ 159576 | consumed samples: 57392 | elapsed time per iteration (ms): 14718.7 | learning rate: 1.590E-05 | global batch size: 32 | lm loss: 6.448256E+00 | loss scale: 32768.0 | grad norm: 151950.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3271/ 159576 | consumed samples: 57424 | elapsed time per iteration (ms): 14527.0 | learning rate: 1.591E-05 | global batch size: 32 | lm loss: 6.688755E+00 | loss scale: 32768.0 | grad norm: 91675.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3272/ 159576 | consumed samples: 57456 | elapsed time per iteration (ms): 14559.6 | learning rate: 1.592E-05 | global batch size: 32 | lm loss: 6.550324E+00 | loss scale: 32768.0 | grad norm: 241437.766 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3273/ 159576 | consumed samples: 57488 | elapsed time per iteration (ms): 14521.4 | learning rate: 1.593E-05 | global batch size: 32 | lm loss: 6.620804E+00 | loss scale: 32768.0 | grad norm: 130842.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3274/ 159576 | consumed samples: 57520 | elapsed time per iteration (ms): 14697.5 | learning rate: 1.594E-05 | global batch size: 32 | lm loss: 6.459725E+00 | loss scale: 32768.0 | grad norm: 146465.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3275/ 159576 | consumed samples: 57552 | elapsed time per iteration (ms): 14476.2 | learning rate: 1.595E-05 | global batch size: 32 | lm loss: 6.576751E+00 | loss scale: 32768.0 | grad norm: 114711.531 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3276/ 159576 | consumed samples: 57584 | elapsed time per iteration (ms): 14512.4 | learning rate: 1.595E-05 | global batch size: 32 | lm loss: 6.599717E+00 | loss scale: 32768.0 | grad norm: 283220.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3277/ 159576 | consumed samples: 57616 | elapsed time per iteration (ms): 14565.0 | learning rate: 1.596E-05 | global batch size: 32 | lm loss: 6.395351E+00 | loss scale: 32768.0 | grad norm: 206105.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3278/ 159576 | consumed samples: 57648 | elapsed time per iteration (ms): 14816.8 | learning rate: 1.597E-05 | global batch size: 32 | lm loss: 6.569580E+00 | loss scale: 32768.0 | grad norm: 183586.115 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3279/ 159576 | consumed samples: 57680 | elapsed time per iteration (ms): 14615.5 | learning rate: 1.598E-05 | global batch size: 32 | lm loss: 6.572281E+00 | loss scale: 32768.0 | grad norm: 161878.009 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3280/ 159576 | consumed samples: 57712 | elapsed time per iteration (ms): 14521.1 | learning rate: 1.599E-05 | global batch size: 32 | lm loss: 6.513469E+00 | loss scale: 32768.0 | grad norm: 134922.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3281/ 159576 | consumed samples: 57744 | elapsed time per iteration (ms): 14549.6 | learning rate: 1.600E-05 | global batch size: 32 | lm loss: 6.680450E+00 | loss scale: 32768.0 | grad norm: 214593.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3282/ 159576 | consumed samples: 57776 | elapsed time per iteration (ms): 14885.6 | learning rate: 1.601E-05 | global batch size: 32 | lm loss: 6.528894E+00 | loss scale: 32768.0 | grad norm: 136120.139 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3283/ 159576 | consumed samples: 57808 | elapsed time per iteration (ms): 14648.1 | learning rate: 1.602E-05 | global batch size: 32 | lm loss: 6.610715E+00 | loss scale: 32768.0 | grad norm: 124689.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3284/ 159576 | consumed samples: 57840 | elapsed time per iteration (ms): 14446.0 | learning rate: 1.603E-05 | global batch size: 32 | lm loss: 6.493599E+00 | loss scale: 32768.0 | grad norm: 193703.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3285/ 159576 | consumed samples: 57872 | elapsed time per iteration (ms): 14530.4 | learning rate: 1.603E-05 | global batch size: 32 | lm loss: 6.495665E+00 | loss scale: 32768.0 | grad norm: 180680.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3286/ 159576 | consumed samples: 57904 | elapsed time per iteration (ms): 15079.8 | learning rate: 1.604E-05 | global batch size: 32 | lm loss: 6.484368E+00 | loss scale: 32768.0 | grad norm: 151352.752 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3287/ 159576 | consumed samples: 57936 | elapsed time per iteration (ms): 14519.7 | learning rate: 1.605E-05 | global batch size: 32 | lm loss: 6.533234E+00 | loss scale: 32768.0 | grad norm: 135972.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3288/ 159576 | consumed samples: 57968 | elapsed time per iteration (ms): 14502.1 | learning rate: 1.606E-05 | global batch size: 32 | lm loss: 6.485931E+00 | loss scale: 32768.0 | grad norm: 175469.781 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3289/ 159576 | consumed samples: 58000 | elapsed time per iteration (ms): 14650.6 | learning rate: 1.607E-05 | global batch size: 32 | lm loss: 6.588792E+00 | loss scale: 32768.0 | grad norm: 95804.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3290/ 159576 | consumed samples: 58032 | elapsed time per iteration (ms): 15011.0 | learning rate: 1.608E-05 | global batch size: 32 | lm loss: 6.649066E+00 | loss scale: 32768.0 | grad norm: 158912.035 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3291/ 159576 | consumed samples: 58064 | elapsed time per iteration (ms): 14545.2 | learning rate: 1.609E-05 | global batch size: 32 | lm loss: 6.518328E+00 | loss scale: 32768.0 | grad norm: 143118.125 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3292/ 159576 | consumed samples: 58096 | elapsed time per iteration (ms): 14548.9 | learning rate: 1.610E-05 | global batch size: 32 | lm loss: 6.497085E+00 | loss scale: 32768.0 | grad norm: 242609.168 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3293/ 159576 | consumed samples: 58128 | elapsed time per iteration (ms): 14674.4 | learning rate: 1.611E-05 | global batch size: 32 | lm loss: 6.516074E+00 | loss scale: 32768.0 | grad norm: 230563.177 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3294/ 159576 | consumed samples: 58160 | elapsed time per iteration (ms): 15018.5 | learning rate: 1.611E-05 | global batch size: 32 | lm loss: 6.357250E+00 | loss scale: 32768.0 | grad norm: 145279.947 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3295/ 159576 | consumed samples: 58192 | elapsed time per iteration (ms): 14502.4 | learning rate: 1.612E-05 | global batch size: 32 | lm loss: 6.532835E+00 | loss scale: 32768.0 | grad norm: 159209.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3296/ 159576 | consumed samples: 58224 | elapsed time per iteration (ms): 14618.1 | learning rate: 1.613E-05 | global batch size: 32 | lm loss: 6.610238E+00 | loss scale: 32768.0 | grad norm: 103662.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3297/ 159576 | consumed samples: 58256 | elapsed time per iteration (ms): 14641.0 | learning rate: 1.614E-05 | global batch size: 32 | lm loss: 6.559636E+00 | loss scale: 32768.0 | grad norm: 342247.705 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3298/ 159576 | consumed samples: 58288 | elapsed time per iteration (ms): 14987.0 | learning rate: 1.615E-05 | global batch size: 32 | lm loss: 6.595356E+00 | loss scale: 32768.0 | grad norm: 185444.091 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3299/ 159576 | consumed samples: 58320 | elapsed time per iteration (ms): 14547.8 | learning rate: 1.616E-05 | global batch size: 32 | lm loss: 6.538537E+00 | loss scale: 32768.0 | grad norm: 145127.777 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3300/ 159576 | consumed samples: 58352 | elapsed time per iteration (ms): 14643.9 | learning rate: 1.617E-05 | global batch size: 32 | lm loss: 6.453721E+00 | loss scale: 32768.0 | grad norm: 235646.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3301/ 159576 | consumed samples: 58384 | elapsed time per iteration (ms): 14648.1 | learning rate: 1.618E-05 | global batch size: 32 | lm loss: 6.672456E+00 | loss scale: 32768.0 | grad norm: 131805.014 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3302/ 159576 | consumed samples: 58416 | elapsed time per iteration (ms): 15043.8 | learning rate: 1.618E-05 | global batch size: 32 | lm loss: 6.513996E+00 | loss scale: 32768.0 | grad norm: 172559.158 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3303/ 159576 | consumed samples: 58448 | elapsed time per iteration (ms): 14557.7 | learning rate: 1.619E-05 | global batch size: 32 | lm loss: 6.688443E+00 | loss scale: 32768.0 | grad norm: 154181.843 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3304/ 159576 | consumed samples: 58480 | elapsed time per iteration (ms): 14541.6 | learning rate: 1.620E-05 | global batch size: 32 | lm loss: 6.865191E+00 | loss scale: 32768.0 | grad norm: 171141.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3305/ 159576 | consumed samples: 58512 | elapsed time per iteration (ms): 14558.8 | learning rate: 1.621E-05 | global batch size: 32 | lm loss: 6.529626E+00 | loss scale: 32768.0 | grad norm: 112641.720 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3306/ 159576 | consumed samples: 58544 | elapsed time per iteration (ms): 14971.5 | learning rate: 1.622E-05 | global batch size: 32 | lm loss: 6.571610E+00 | loss scale: 32768.0 | grad norm: 115411.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3307/ 159576 | consumed samples: 58576 | elapsed time per iteration (ms): 14532.6 | learning rate: 1.623E-05 | global batch size: 32 | lm loss: 6.792900E+00 | loss scale: 32768.0 | grad norm: 153224.669 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3308/ 159576 | consumed samples: 58608 | elapsed time per iteration (ms): 14639.5 | learning rate: 1.624E-05 | global batch size: 32 | lm loss: 6.490854E+00 | loss scale: 32768.0 | grad norm: 125276.183 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3309/ 159576 | consumed samples: 58640 | elapsed time per iteration (ms): 14639.4 | learning rate: 1.625E-05 | global batch size: 32 | lm loss: 6.604795E+00 | loss scale: 32768.0 | grad norm: 163307.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3310/ 159576 | consumed samples: 58672 | elapsed time per iteration (ms): 14641.3 | learning rate: 1.626E-05 | global batch size: 32 | lm loss: 6.486001E+00 | loss scale: 32768.0 | grad norm: 169732.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3311/ 159576 | consumed samples: 58704 | elapsed time per iteration (ms): 14763.3 | learning rate: 1.626E-05 | global batch size: 32 | lm loss: 6.513995E+00 | loss scale: 32768.0 | grad norm: 106129.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3312/ 159576 | consumed samples: 58736 | elapsed time per iteration (ms): 14481.4 | learning rate: 1.627E-05 | global batch size: 32 | lm loss: 6.538834E+00 | loss scale: 32768.0 | grad norm: 143827.047 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3313/ 159576 | consumed samples: 58768 | elapsed time per iteration (ms): 14535.0 | learning rate: 1.628E-05 | global batch size: 32 | lm loss: 6.508898E+00 | loss scale: 32768.0 | grad norm: 96517.736 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3314/ 159576 | consumed samples: 58800 | elapsed time per iteration (ms): 14389.3 | learning rate: 1.629E-05 | global batch size: 32 | lm loss: 6.557344E+00 | loss scale: 32768.0 | grad norm: 160647.640 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3315/ 159576 | consumed samples: 58832 | elapsed time per iteration (ms): 14617.9 | learning rate: 1.630E-05 | global batch size: 32 | lm loss: 6.579730E+00 | loss scale: 32768.0 | grad norm: 166511.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3316/ 159576 | consumed samples: 58864 | elapsed time per iteration (ms): 14527.6 | learning rate: 1.631E-05 | global batch size: 32 | lm loss: 6.510201E+00 | loss scale: 32768.0 | grad norm: 147882.179 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3317/ 159576 | consumed samples: 58896 | elapsed time per iteration (ms): 14470.3 | learning rate: 1.632E-05 | global batch size: 32 | lm loss: 6.570679E+00 | loss scale: 32768.0 | grad norm: 133948.873 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3318/ 159576 | consumed samples: 58928 | elapsed time per iteration (ms): 14503.9 | learning rate: 1.633E-05 | global batch size: 32 | lm loss: 6.505450E+00 | loss scale: 32768.0 | grad norm: 117987.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3319/ 159576 | consumed samples: 58960 | elapsed time per iteration (ms): 14576.7 | learning rate: 1.634E-05 | global batch size: 32 | lm loss: 6.637349E+00 | loss scale: 32768.0 | grad norm: 158753.005 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3320/ 159576 | consumed samples: 58992 | elapsed time per iteration (ms): 14474.5 | learning rate: 1.634E-05 | global batch size: 32 | lm loss: 6.463197E+00 | loss scale: 32768.0 | grad norm: 133223.814 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3321/ 159576 | consumed samples: 59024 | elapsed time per iteration (ms): 14495.2 | learning rate: 1.635E-05 | global batch size: 32 | lm loss: 6.754025E+00 | loss scale: 32768.0 | grad norm: 147882.857 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3322/ 159576 | consumed samples: 59056 | elapsed time per iteration (ms): 14426.8 | learning rate: 1.636E-05 | global batch size: 32 | lm loss: 6.377756E+00 | loss scale: 32768.0 | grad norm: 107176.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3323/ 159576 | consumed samples: 59088 | elapsed time per iteration (ms): 14894.2 | learning rate: 1.637E-05 | global batch size: 32 | lm loss: 6.485399E+00 | loss scale: 32768.0 | grad norm: 104276.979 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3324/ 159576 | consumed samples: 59120 | elapsed time per iteration (ms): 14539.8 | learning rate: 1.638E-05 | global batch size: 32 | lm loss: 6.595620E+00 | loss scale: 32768.0 | grad norm: 102253.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3325/ 159576 | consumed samples: 59152 | elapsed time per iteration (ms): 14528.7 | learning rate: 1.639E-05 | global batch size: 32 | lm loss: 6.372971E+00 | loss scale: 32768.0 | grad norm: 170203.107 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3326/ 159576 | consumed samples: 59184 | elapsed time per iteration (ms): 14629.3 | learning rate: 1.640E-05 | global batch size: 32 | lm loss: 6.460327E+00 | loss scale: 32768.0 | grad norm: 108888.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3327/ 159576 | consumed samples: 59216 | elapsed time per iteration (ms): 15011.9 | learning rate: 1.641E-05 | global batch size: 32 | lm loss: 6.462082E+00 | loss scale: 32768.0 | grad norm: 154915.863 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3328/ 159576 | consumed samples: 59248 | elapsed time per iteration (ms): 14457.0 | learning rate: 1.642E-05 | global batch size: 32 | lm loss: 6.526529E+00 | loss scale: 32768.0 | grad norm: 135486.854 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3329/ 159576 | consumed samples: 59280 | elapsed time per iteration (ms): 14493.0 | learning rate: 1.642E-05 | global batch size: 32 | lm loss: 6.546029E+00 | loss scale: 32768.0 | grad norm: 97252.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3330/ 159576 | consumed samples: 59312 | elapsed time per iteration (ms): 14488.7 | learning rate: 1.643E-05 | global batch size: 32 | lm loss: 6.540400E+00 | loss scale: 32768.0 | grad norm: 234564.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3331/ 159576 | consumed samples: 59344 | elapsed time per iteration (ms): 14982.7 | learning rate: 1.644E-05 | global batch size: 32 | lm loss: 6.473689E+00 | loss scale: 32768.0 | grad norm: 104411.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3332/ 159576 | consumed samples: 59376 | elapsed time per iteration (ms): 14455.1 | learning rate: 1.645E-05 | global batch size: 32 | lm loss: 6.589927E+00 | loss scale: 32768.0 | grad norm: 240696.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 15:06:48] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 15:06:48] PULSE: tr8-104B is running for 9:14:37 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 3333/ 159576 | consumed samples: 59408 | elapsed time per iteration (ms): 14571.6 | learning rate: 1.646E-05 | global batch size: 32 | lm loss: 6.604051E+00 | loss scale: 32768.0 | grad norm: 150869.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3334/ 159576 | consumed samples: 59440 | elapsed time per iteration (ms): 14495.5 | learning rate: 1.647E-05 | global batch size: 32 | lm loss: 6.565775E+00 | loss scale: 32768.0 | grad norm: 141203.105 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3335/ 159576 | consumed samples: 59472 | elapsed time per iteration (ms): 14896.4 | learning rate: 1.648E-05 | global batch size: 32 | lm loss: 6.456505E+00 | loss scale: 32768.0 | grad norm: 145244.969 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3336/ 159576 | consumed samples: 59504 | elapsed time per iteration (ms): 14515.3 | learning rate: 1.649E-05 | global batch size: 32 | lm loss: 6.488969E+00 | loss scale: 32768.0 | grad norm: 246097.062 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3337/ 159576 | consumed samples: 59536 | elapsed time per iteration (ms): 14492.7 | learning rate: 1.650E-05 | global batch size: 32 | lm loss: 6.455498E+00 | loss scale: 32768.0 | grad norm: 130955.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3338/ 159576 | consumed samples: 59568 | elapsed time per iteration (ms): 14531.1 | learning rate: 1.650E-05 | global batch size: 32 | lm loss: 6.593586E+00 | loss scale: 32768.0 | grad norm: 136721.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3339/ 159576 | consumed samples: 59600 | elapsed time per iteration (ms): 14962.3 | learning rate: 1.651E-05 | global batch size: 32 | lm loss: 6.564628E+00 | loss scale: 32768.0 | grad norm: 141976.785 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3340/ 159576 | consumed samples: 59632 | elapsed time per iteration (ms): 14550.8 | learning rate: 1.652E-05 | global batch size: 32 | lm loss: 6.373518E+00 | loss scale: 32768.0 | grad norm: 113008.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3341/ 159576 | consumed samples: 59664 | elapsed time per iteration (ms): 14563.2 | learning rate: 1.653E-05 | global batch size: 32 | lm loss: 6.658302E+00 | loss scale: 32768.0 | grad norm: 113653.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3342/ 159576 | consumed samples: 59696 | elapsed time per iteration (ms): 14584.3 | learning rate: 1.654E-05 | global batch size: 32 | lm loss: 6.485311E+00 | loss scale: 32768.0 | grad norm: 162130.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3343/ 159576 | consumed samples: 59728 | elapsed time per iteration (ms): 14879.0 | learning rate: 1.655E-05 | global batch size: 32 | lm loss: 6.461338E+00 | loss scale: 32768.0 | grad norm: 284392.029 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3344/ 159576 | consumed samples: 59760 | elapsed time per iteration (ms): 14679.3 | learning rate: 1.656E-05 | global batch size: 32 | lm loss: 6.473630E+00 | loss scale: 32768.0 | grad norm: 142043.769 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3345/ 159576 | consumed samples: 59792 | elapsed time per iteration (ms): 14580.5 | learning rate: 1.657E-05 | global batch size: 32 | lm loss: 6.494667E+00 | loss scale: 32768.0 | grad norm: 125366.936 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3346/ 159576 | consumed samples: 59824 | elapsed time per iteration (ms): 14552.3 | learning rate: 1.658E-05 | global batch size: 32 | lm loss: 6.560155E+00 | loss scale: 32768.0 | grad norm: 126654.040 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3347/ 159576 | consumed samples: 59856 | elapsed time per iteration (ms): 14707.5 | learning rate: 1.658E-05 | global batch size: 32 | lm loss: 6.462931E+00 | loss scale: 32768.0 | grad norm: 123122.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3348/ 159576 | consumed samples: 59888 | elapsed time per iteration (ms): 14897.9 | learning rate: 1.659E-05 | global batch size: 32 | lm loss: 6.542427E+00 | loss scale: 32768.0 | grad norm: 147629.605 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3349/ 159576 | consumed samples: 59920 | elapsed time per iteration (ms): 14638.7 | learning rate: 1.660E-05 | global batch size: 32 | lm loss: 6.508281E+00 | loss scale: 32768.0 | grad norm: 181625.911 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3350/ 159576 | consumed samples: 59952 | elapsed time per iteration (ms): 14590.8 | learning rate: 1.661E-05 | global batch size: 32 | lm loss: 6.592540E+00 | loss scale: 32768.0 | grad norm: 161023.762 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3351/ 159576 | consumed samples: 59984 | elapsed time per iteration (ms): 14484.6 | learning rate: 1.662E-05 | global batch size: 32 | lm loss: 6.474733E+00 | loss scale: 32768.0 | grad norm: 125810.881 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3352/ 159576 | consumed samples: 60016 | elapsed time per iteration (ms): 14782.0 | learning rate: 1.663E-05 | global batch size: 32 | lm loss: 6.515071E+00 | loss scale: 32768.0 | grad norm: 148493.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3353/ 159576 | consumed samples: 60048 | elapsed time per iteration (ms): 14601.7 | learning rate: 1.664E-05 | global batch size: 32 | lm loss: 6.510946E+00 | loss scale: 32768.0 | grad norm: 154098.157 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3354/ 159576 | consumed samples: 60080 | elapsed time per iteration (ms): 14551.7 | learning rate: 1.665E-05 | global batch size: 32 | lm loss: 6.639778E+00 | loss scale: 32768.0 | grad norm: 120125.841 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3355/ 159576 | consumed samples: 60112 | elapsed time per iteration (ms): 14609.6 | learning rate: 1.666E-05 | global batch size: 32 | lm loss: 6.582976E+00 | loss scale: 32768.0 | grad norm: 125934.744 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3356/ 159576 | consumed samples: 60144 | elapsed time per iteration (ms): 14773.2 | learning rate: 1.666E-05 | global batch size: 32 | lm loss: 6.492831E+00 | loss scale: 32768.0 | grad norm: 114199.841 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3357/ 159576 | consumed samples: 60176 | elapsed time per iteration (ms): 14529.3 | learning rate: 1.667E-05 | global batch size: 32 | lm loss: 6.348350E+00 | loss scale: 32768.0 | grad norm: 224039.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3358/ 159576 | consumed samples: 60208 | elapsed time per iteration (ms): 14555.6 | learning rate: 1.668E-05 | global batch size: 32 | lm loss: 6.556470E+00 | loss scale: 32768.0 | grad norm: 104992.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3359/ 159576 | consumed samples: 60240 | elapsed time per iteration (ms): 14550.6 | learning rate: 1.669E-05 | global batch size: 32 | lm loss: 6.499870E+00 | loss scale: 32768.0 | grad norm: 135382.686 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3360/ 159576 | consumed samples: 60272 | elapsed time per iteration (ms): 14838.2 | learning rate: 1.670E-05 | global batch size: 32 | lm loss: 6.482747E+00 | loss scale: 32768.0 | grad norm: 128815.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3361/ 159576 | consumed samples: 60304 | elapsed time per iteration (ms): 14577.3 | learning rate: 1.671E-05 | global batch size: 32 | lm loss: 6.564407E+00 | loss scale: 32768.0 | grad norm: 220163.959 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3362/ 159576 | consumed samples: 60336 | elapsed time per iteration (ms): 14600.9 | learning rate: 1.672E-05 | global batch size: 32 | lm loss: 6.561186E+00 | loss scale: 32768.0 | grad norm: 110111.851 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3363/ 159576 | consumed samples: 60368 | elapsed time per iteration (ms): 14665.2 | learning rate: 1.673E-05 | global batch size: 32 | lm loss: 6.624823E+00 | loss scale: 32768.0 | grad norm: 119091.671 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3364/ 159576 | consumed samples: 60400 | elapsed time per iteration (ms): 14799.6 | learning rate: 1.674E-05 | global batch size: 32 | lm loss: 6.572470E+00 | loss scale: 32768.0 | grad norm: 157986.014 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3365/ 159576 | consumed samples: 60432 | elapsed time per iteration (ms): 14663.0 | learning rate: 1.674E-05 | global batch size: 32 | lm loss: 6.613792E+00 | loss scale: 32768.0 | grad norm: 103982.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3366/ 159576 | consumed samples: 60464 | elapsed time per iteration (ms): 14481.2 | learning rate: 1.675E-05 | global batch size: 32 | lm loss: 6.387408E+00 | loss scale: 32768.0 | grad norm: 158220.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3367/ 159576 | consumed samples: 60496 | elapsed time per iteration (ms): 14521.1 | learning rate: 1.676E-05 | global batch size: 32 | lm loss: 6.515392E+00 | loss scale: 32768.0 | grad norm: 123622.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3368/ 159576 | consumed samples: 60528 | elapsed time per iteration (ms): 15053.7 | learning rate: 1.677E-05 | global batch size: 32 | lm loss: 6.568096E+00 | loss scale: 32768.0 | grad norm: 255456.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3369/ 159576 | consumed samples: 60560 | elapsed time per iteration (ms): 14696.0 | learning rate: 1.678E-05 | global batch size: 32 | lm loss: 6.553046E+00 | loss scale: 32768.0 | grad norm: 144928.658 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3370/ 159576 | consumed samples: 60592 | elapsed time per iteration (ms): 14594.8 | learning rate: 1.679E-05 | global batch size: 32 | lm loss: 6.341058E+00 | loss scale: 32768.0 | grad norm: 190527.660 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3371/ 159576 | consumed samples: 60624 | elapsed time per iteration (ms): 14611.4 | learning rate: 1.680E-05 | global batch size: 32 | lm loss: 6.406933E+00 | loss scale: 32768.0 | grad norm: 164464.189 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3372/ 159576 | consumed samples: 60656 | elapsed time per iteration (ms): 14997.7 | learning rate: 1.681E-05 | global batch size: 32 | lm loss: 6.472693E+00 | loss scale: 32768.0 | grad norm: 140499.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3373/ 159576 | consumed samples: 60688 | elapsed time per iteration (ms): 14555.5 | learning rate: 1.682E-05 | global batch size: 32 | lm loss: 6.472823E+00 | loss scale: 32768.0 | grad norm: 209200.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3374/ 159576 | consumed samples: 60720 | elapsed time per iteration (ms): 14538.5 | learning rate: 1.682E-05 | global batch size: 32 | lm loss: 6.575472E+00 | loss scale: 32768.0 | grad norm: 152311.741 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3375/ 159576 | consumed samples: 60752 | elapsed time per iteration (ms): 14542.0 | learning rate: 1.683E-05 | global batch size: 32 | lm loss: 6.559402E+00 | loss scale: 32768.0 | grad norm: 139207.917 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3376/ 159576 | consumed samples: 60784 | elapsed time per iteration (ms): 14908.5 | learning rate: 1.684E-05 | global batch size: 32 | lm loss: 6.450352E+00 | loss scale: 32768.0 | grad norm: 132808.916 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3377/ 159576 | consumed samples: 60816 | elapsed time per iteration (ms): 14576.3 | learning rate: 1.685E-05 | global batch size: 32 | lm loss: 6.365215E+00 | loss scale: 32768.0 | grad norm: 176292.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3378/ 159576 | consumed samples: 60848 | elapsed time per iteration (ms): 14602.1 | learning rate: 1.686E-05 | global batch size: 32 | lm loss: 6.443403E+00 | loss scale: 32768.0 | grad norm: 123052.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3379/ 159576 | consumed samples: 60880 | elapsed time per iteration (ms): 14651.7 | learning rate: 1.687E-05 | global batch size: 32 | lm loss: 6.502498E+00 | loss scale: 32768.0 | grad norm: 100381.015 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3380/ 159576 | consumed samples: 60912 | elapsed time per iteration (ms): 14854.4 | learning rate: 1.688E-05 | global batch size: 32 | lm loss: 6.296595E+00 | loss scale: 32768.0 | grad norm: 110161.712 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3381/ 159576 | consumed samples: 60944 | elapsed time per iteration (ms): 14541.8 | learning rate: 1.689E-05 | global batch size: 32 | lm loss: 6.563570E+00 | loss scale: 32768.0 | grad norm: 88591.921 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3382/ 159576 | consumed samples: 60976 | elapsed time per iteration (ms): 14608.6 | learning rate: 1.689E-05 | global batch size: 32 | lm loss: 6.582268E+00 | loss scale: 32768.0 | grad norm: 114214.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3383/ 159576 | consumed samples: 61008 | elapsed time per iteration (ms): 14527.6 | learning rate: 1.690E-05 | global batch size: 32 | lm loss: 6.577205E+00 | loss scale: 32768.0 | grad norm: 122437.521 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3384/ 159576 | consumed samples: 61040 | elapsed time per iteration (ms): 14914.6 | learning rate: 1.691E-05 | global batch size: 32 | lm loss: 6.428950E+00 | loss scale: 32768.0 | grad norm: 125848.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3385/ 159576 | consumed samples: 61072 | elapsed time per iteration (ms): 14662.1 | learning rate: 1.692E-05 | global batch size: 32 | lm loss: 6.677817E+00 | loss scale: 32768.0 | grad norm: 110496.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3386/ 159576 | consumed samples: 61104 | elapsed time per iteration (ms): 14566.3 | learning rate: 1.693E-05 | global batch size: 32 | lm loss: 6.704777E+00 | loss scale: 32768.0 | grad norm: 128540.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3387/ 159576 | consumed samples: 61136 | elapsed time per iteration (ms): 14563.5 | learning rate: 1.694E-05 | global batch size: 32 | lm loss: 6.578674E+00 | loss scale: 32768.0 | grad norm: 143780.108 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3388/ 159576 | consumed samples: 61168 | elapsed time per iteration (ms): 14890.7 | learning rate: 1.695E-05 | global batch size: 32 | lm loss: 6.503931E+00 | loss scale: 32768.0 | grad norm: 144574.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3389/ 159576 | consumed samples: 61200 | elapsed time per iteration (ms): 14672.5 | learning rate: 1.696E-05 | global batch size: 32 | lm loss: 6.662019E+00 | loss scale: 32768.0 | grad norm: 158358.181 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3390/ 159576 | consumed samples: 61232 | elapsed time per iteration (ms): 14563.8 | learning rate: 1.697E-05 | global batch size: 32 | lm loss: 6.577336E+00 | loss scale: 32768.0 | grad norm: 198110.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3391/ 159576 | consumed samples: 61264 | elapsed time per iteration (ms): 14556.6 | learning rate: 1.697E-05 | global batch size: 32 | lm loss: 6.480102E+00 | loss scale: 32768.0 | grad norm: 131120.843 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3392/ 159576 | consumed samples: 61296 | elapsed time per iteration (ms): 14679.5 | learning rate: 1.698E-05 | global batch size: 32 | lm loss: 6.610832E+00 | loss scale: 32768.0 | grad norm: 164581.156 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3393/ 159576 | consumed samples: 61328 | elapsed time per iteration (ms): 14940.6 | learning rate: 1.699E-05 | global batch size: 32 | lm loss: 6.591301E+00 | loss scale: 32768.0 | grad norm: 109544.075 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3394/ 159576 | consumed samples: 61360 | elapsed time per iteration (ms): 14592.5 | learning rate: 1.700E-05 | global batch size: 32 | lm loss: 6.572402E+00 | loss scale: 32768.0 | grad norm: 121937.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3395/ 159576 | consumed samples: 61392 | elapsed time per iteration (ms): 14696.4 | learning rate: 1.701E-05 | global batch size: 32 | lm loss: 6.509333E+00 | loss scale: 32768.0 | grad norm: 125128.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3396/ 159576 | consumed samples: 61424 | elapsed time per iteration (ms): 14508.0 | learning rate: 1.702E-05 | global batch size: 32 | lm loss: 6.481079E+00 | loss scale: 32768.0 | grad norm: 111910.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3397/ 159576 | consumed samples: 61456 | elapsed time per iteration (ms): 14790.4 | learning rate: 1.703E-05 | global batch size: 32 | lm loss: 6.548109E+00 | loss scale: 32768.0 | grad norm: 98717.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3398/ 159576 | consumed samples: 61488 | elapsed time per iteration (ms): 14622.0 | learning rate: 1.704E-05 | global batch size: 32 | lm loss: 6.769459E+00 | loss scale: 32768.0 | grad norm: 117754.948 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3399/ 159576 | consumed samples: 61520 | elapsed time per iteration (ms): 14611.9 | learning rate: 1.705E-05 | global batch size: 32 | lm loss: 6.555518E+00 | loss scale: 32768.0 | grad norm: 122435.114 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3400/ 159576 | consumed samples: 61552 | elapsed time per iteration (ms): 14673.6 | learning rate: 1.705E-05 | global batch size: 32 | lm loss: 6.464739E+00 | loss scale: 32768.0 | grad norm: 119112.548 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3401/ 159576 | consumed samples: 61584 | elapsed time per iteration (ms): 14910.7 | learning rate: 1.706E-05 | global batch size: 32 | lm loss: 6.473111E+00 | loss scale: 32768.0 | grad norm: 113410.819 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3402/ 159576 | consumed samples: 61616 | elapsed time per iteration (ms): 14645.2 | learning rate: 1.707E-05 | global batch size: 32 | lm loss: 6.476302E+00 | loss scale: 32768.0 | grad norm: 113730.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3403/ 159576 | consumed samples: 61648 | elapsed time per iteration (ms): 14580.6 | learning rate: 1.708E-05 | global batch size: 32 | lm loss: 6.449226E+00 | loss scale: 32768.0 | grad norm: 82819.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3404/ 159576 | consumed samples: 61680 | elapsed time per iteration (ms): 14600.7 | learning rate: 1.709E-05 | global batch size: 32 | lm loss: 6.560233E+00 | loss scale: 32768.0 | grad norm: 134696.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3405/ 159576 | consumed samples: 61712 | elapsed time per iteration (ms): 14772.7 | learning rate: 1.710E-05 | global batch size: 32 | lm loss: 6.546908E+00 | loss scale: 32768.0 | grad norm: 101163.521 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3406/ 159576 | consumed samples: 61744 | elapsed time per iteration (ms): 14593.3 | learning rate: 1.711E-05 | global batch size: 32 | lm loss: 6.541033E+00 | loss scale: 32768.0 | grad norm: 109699.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3407/ 159576 | consumed samples: 61776 | elapsed time per iteration (ms): 14624.0 | learning rate: 1.712E-05 | global batch size: 32 | lm loss: 6.511957E+00 | loss scale: 32768.0 | grad norm: 91123.954 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3408/ 159576 | consumed samples: 61808 | elapsed time per iteration (ms): 14724.5 | learning rate: 1.713E-05 | global batch size: 32 | lm loss: 6.628172E+00 | loss scale: 32768.0 | grad norm: 121584.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3409/ 159576 | consumed samples: 61840 | elapsed time per iteration (ms): 15120.6 | learning rate: 1.713E-05 | global batch size: 32 | lm loss: 6.578444E+00 | loss scale: 32768.0 | grad norm: 116757.586 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3410/ 159576 | consumed samples: 61872 | elapsed time per iteration (ms): 14619.5 | learning rate: 1.714E-05 | global batch size: 32 | lm loss: 6.415488E+00 | loss scale: 32768.0 | grad norm: 105815.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3411/ 159576 | consumed samples: 61904 | elapsed time per iteration (ms): 14577.8 | learning rate: 1.715E-05 | global batch size: 32 | lm loss: 6.553544E+00 | loss scale: 32768.0 | grad norm: 104053.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3412/ 159576 | consumed samples: 61936 | elapsed time per iteration (ms): 14587.5 | learning rate: 1.716E-05 | global batch size: 32 | lm loss: 6.435183E+00 | loss scale: 32768.0 | grad norm: 101905.898 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3413/ 159576 | consumed samples: 61968 | elapsed time per iteration (ms): 14985.9 | learning rate: 1.717E-05 | global batch size: 32 | lm loss: 6.580218E+00 | loss scale: 32768.0 | grad norm: 142325.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3414/ 159576 | consumed samples: 62000 | elapsed time per iteration (ms): 14646.8 | learning rate: 1.718E-05 | global batch size: 32 | lm loss: 6.534802E+00 | loss scale: 32768.0 | grad norm: 109771.164 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3415/ 159576 | consumed samples: 62032 | elapsed time per iteration (ms): 14644.6 | learning rate: 1.719E-05 | global batch size: 32 | lm loss: 6.582119E+00 | loss scale: 32768.0 | grad norm: 192056.720 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3416/ 159576 | consumed samples: 62064 | elapsed time per iteration (ms): 14616.1 | learning rate: 1.720E-05 | global batch size: 32 | lm loss: 6.496407E+00 | loss scale: 32768.0 | grad norm: 118953.837 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3417/ 159576 | consumed samples: 62096 | elapsed time per iteration (ms): 15113.2 | learning rate: 1.721E-05 | global batch size: 32 | lm loss: 6.475505E+00 | loss scale: 32768.0 | grad norm: 173828.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3418/ 159576 | consumed samples: 62128 | elapsed time per iteration (ms): 14635.6 | learning rate: 1.721E-05 | global batch size: 32 | lm loss: 6.318462E+00 | loss scale: 32768.0 | grad norm: 147925.562 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3419/ 159576 | consumed samples: 62160 | elapsed time per iteration (ms): 14611.3 | learning rate: 1.722E-05 | global batch size: 32 | lm loss: 6.571759E+00 | loss scale: 32768.0 | grad norm: 112885.924 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3420/ 159576 | consumed samples: 62192 | elapsed time per iteration (ms): 14573.5 | learning rate: 1.723E-05 | global batch size: 32 | lm loss: 6.461047E+00 | loss scale: 32768.0 | grad norm: 135373.791 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3421/ 159576 | consumed samples: 62224 | elapsed time per iteration (ms): 14978.7 | learning rate: 1.724E-05 | global batch size: 32 | lm loss: 6.554849E+00 | loss scale: 32768.0 | grad norm: 162048.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3422/ 159576 | consumed samples: 62256 | elapsed time per iteration (ms): 14574.6 | learning rate: 1.725E-05 | global batch size: 32 | lm loss: 6.443440E+00 | loss scale: 32768.0 | grad norm: 103393.805 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3423/ 159576 | consumed samples: 62288 | elapsed time per iteration (ms): 14578.8 | learning rate: 1.726E-05 | global batch size: 32 | lm loss: 6.490220E+00 | loss scale: 32768.0 | grad norm: 217891.504 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3424/ 159576 | consumed samples: 62320 | elapsed time per iteration (ms): 14669.3 | learning rate: 1.727E-05 | global batch size: 32 | lm loss: 6.475744E+00 | loss scale: 32768.0 | grad norm: 132019.548 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3425/ 159576 | consumed samples: 62352 | elapsed time per iteration (ms): 15003.7 | learning rate: 1.728E-05 | global batch size: 32 | lm loss: 6.639316E+00 | loss scale: 32768.0 | grad norm: 118549.933 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3426/ 159576 | consumed samples: 62384 | elapsed time per iteration (ms): 14473.5 | learning rate: 1.729E-05 | global batch size: 32 | lm loss: 6.529860E+00 | loss scale: 32768.0 | grad norm: 110134.510 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3427/ 159576 | consumed samples: 62416 | elapsed time per iteration (ms): 14593.0 | learning rate: 1.729E-05 | global batch size: 32 | lm loss: 6.424025E+00 | loss scale: 32768.0 | grad norm: 96948.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3428/ 159576 | consumed samples: 62448 | elapsed time per iteration (ms): 14574.8 | learning rate: 1.730E-05 | global batch size: 32 | lm loss: 6.603945E+00 | loss scale: 32768.0 | grad norm: 108813.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3429/ 159576 | consumed samples: 62480 | elapsed time per iteration (ms): 14962.4 | learning rate: 1.731E-05 | global batch size: 32 | lm loss: 6.519920E+00 | loss scale: 32768.0 | grad norm: 120997.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3430/ 159576 | consumed samples: 62512 | elapsed time per iteration (ms): 14606.5 | learning rate: 1.732E-05 | global batch size: 32 | lm loss: 6.519583E+00 | loss scale: 32768.0 | grad norm: 102226.597 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3431/ 159576 | consumed samples: 62544 | elapsed time per iteration (ms): 14685.5 | learning rate: 1.733E-05 | global batch size: 32 | lm loss: 6.413152E+00 | loss scale: 32768.0 | grad norm: 146442.757 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3432/ 159576 | consumed samples: 62576 | elapsed time per iteration (ms): 14642.7 | learning rate: 1.734E-05 | global batch size: 32 | lm loss: 6.416885E+00 | loss scale: 32768.0 | grad norm: 106692.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3433/ 159576 | consumed samples: 62608 | elapsed time per iteration (ms): 14943.4 | learning rate: 1.735E-05 | global batch size: 32 | lm loss: 6.684166E+00 | loss scale: 32768.0 | grad norm: 122647.780 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3434/ 159576 | consumed samples: 62640 | elapsed time per iteration (ms): 14559.8 | learning rate: 1.736E-05 | global batch size: 32 | lm loss: 6.582661E+00 | loss scale: 32768.0 | grad norm: 143037.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3435/ 159576 | consumed samples: 62672 | elapsed time per iteration (ms): 14581.0 | learning rate: 1.737E-05 | global batch size: 32 | lm loss: 6.459047E+00 | loss scale: 32768.0 | grad norm: 139754.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3436/ 159576 | consumed samples: 62704 | elapsed time per iteration (ms): 14594.3 | learning rate: 1.737E-05 | global batch size: 32 | lm loss: 6.455495E+00 | loss scale: 32768.0 | grad norm: 199133.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3437/ 159576 | consumed samples: 62736 | elapsed time per iteration (ms): 14983.6 | learning rate: 1.738E-05 | global batch size: 32 | lm loss: 6.507184E+00 | loss scale: 32768.0 | grad norm: 193681.925 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3438/ 159576 | consumed samples: 62768 | elapsed time per iteration (ms): 14797.2 | learning rate: 1.739E-05 | global batch size: 32 | lm loss: 6.461359E+00 | loss scale: 32768.0 | grad norm: 132732.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3439/ 159576 | consumed samples: 62800 | elapsed time per iteration (ms): 14579.8 | learning rate: 1.740E-05 | global batch size: 32 | lm loss: 6.704415E+00 | loss scale: 32768.0 | grad norm: 113391.882 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3440/ 159576 | consumed samples: 62832 | elapsed time per iteration (ms): 14621.6 | learning rate: 1.741E-05 | global batch size: 32 | lm loss: 6.473897E+00 | loss scale: 32768.0 | grad norm: 120849.572 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3441/ 159576 | consumed samples: 62864 | elapsed time per iteration (ms): 14686.1 | learning rate: 1.742E-05 | global batch size: 32 | lm loss: 6.459955E+00 | loss scale: 32768.0 | grad norm: 128216.917 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3442/ 159576 | consumed samples: 62896 | elapsed time per iteration (ms): 14857.9 | learning rate: 1.743E-05 | global batch size: 32 | lm loss: 6.424060E+00 | loss scale: 32768.0 | grad norm: 102672.871 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3443/ 159576 | consumed samples: 62928 | elapsed time per iteration (ms): 14570.1 | learning rate: 1.744E-05 | global batch size: 32 | lm loss: 6.534360E+00 | loss scale: 32768.0 | grad norm: 184877.887 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3444/ 159576 | consumed samples: 62960 | elapsed time per iteration (ms): 14620.2 | learning rate: 1.745E-05 | global batch size: 32 | lm loss: 6.629717E+00 | loss scale: 32768.0 | grad norm: 138408.073 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3445/ 159576 | consumed samples: 62992 | elapsed time per iteration (ms): 14619.1 | learning rate: 1.745E-05 | global batch size: 32 | lm loss: 6.494986E+00 | loss scale: 32768.0 | grad norm: 131634.897 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3446/ 159576 | consumed samples: 63024 | elapsed time per iteration (ms): 14739.8 | learning rate: 1.746E-05 | global batch size: 32 | lm loss: 6.529834E+00 | loss scale: 32768.0 | grad norm: 190204.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3447/ 159576 | consumed samples: 63056 | elapsed time per iteration (ms): 14575.9 | learning rate: 1.747E-05 | global batch size: 32 | lm loss: 6.519164E+00 | loss scale: 32768.0 | grad norm: 190893.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3448/ 159576 | consumed samples: 63088 | elapsed time per iteration (ms): 14611.0 | learning rate: 1.748E-05 | global batch size: 32 | lm loss: 6.431557E+00 | loss scale: 32768.0 | grad norm: 127326.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3449/ 159576 | consumed samples: 63120 | elapsed time per iteration (ms): 14615.1 | learning rate: 1.749E-05 | global batch size: 32 | lm loss: 6.213955E+00 | loss scale: 32768.0 | grad norm: 149485.955 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3450/ 159576 | consumed samples: 63152 | elapsed time per iteration (ms): 14697.2 | learning rate: 1.750E-05 | global batch size: 32 | lm loss: 6.669972E+00 | loss scale: 32768.0 | grad norm: 121418.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3451/ 159576 | consumed samples: 63184 | elapsed time per iteration (ms): 14506.2 | learning rate: 1.751E-05 | global batch size: 32 | lm loss: 6.538607E+00 | loss scale: 32768.0 | grad norm: 160228.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3452/ 159576 | consumed samples: 63216 | elapsed time per iteration (ms): 14518.4 | learning rate: 1.752E-05 | global batch size: 32 | lm loss: 6.466623E+00 | loss scale: 32768.0 | grad norm: 132558.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3453/ 159576 | consumed samples: 63248 | elapsed time per iteration (ms): 14654.4 | learning rate: 1.753E-05 | global batch size: 32 | lm loss: 6.575057E+00 | loss scale: 32768.0 | grad norm: 126715.953 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3454/ 159576 | consumed samples: 63280 | elapsed time per iteration (ms): 14975.6 | learning rate: 1.753E-05 | global batch size: 32 | lm loss: 6.469002E+00 | loss scale: 32768.0 | grad norm: 134315.470 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3455/ 159576 | consumed samples: 63312 | elapsed time per iteration (ms): 14595.3 | learning rate: 1.754E-05 | global batch size: 32 | lm loss: 6.471159E+00 | loss scale: 32768.0 | grad norm: 132183.538 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3456/ 159576 | consumed samples: 63344 | elapsed time per iteration (ms): 14624.6 | learning rate: 1.755E-05 | global batch size: 32 | lm loss: 6.390759E+00 | loss scale: 32768.0 | grad norm: 168993.753 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3457/ 159576 | consumed samples: 63376 | elapsed time per iteration (ms): 14611.9 | learning rate: 1.756E-05 | global batch size: 32 | lm loss: 6.545074E+00 | loss scale: 32768.0 | grad norm: 116907.132 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3458/ 159576 | consumed samples: 63408 | elapsed time per iteration (ms): 14991.7 | learning rate: 1.757E-05 | global batch size: 32 | lm loss: 6.541002E+00 | loss scale: 32768.0 | grad norm: 144421.845 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3459/ 159576 | consumed samples: 63440 | elapsed time per iteration (ms): 14690.5 | learning rate: 1.758E-05 | global batch size: 32 | lm loss: 6.549660E+00 | loss scale: 32768.0 | grad norm: 177618.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3460/ 159576 | consumed samples: 63472 | elapsed time per iteration (ms): 14572.5 | learning rate: 1.759E-05 | global batch size: 32 | lm loss: 6.509130E+00 | loss scale: 32768.0 | grad norm: 102216.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3461/ 159576 | consumed samples: 63504 | elapsed time per iteration (ms): 14630.9 | learning rate: 1.760E-05 | global batch size: 32 | lm loss: 6.474805E+00 | loss scale: 32768.0 | grad norm: 198903.879 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3462/ 159576 | consumed samples: 63536 | elapsed time per iteration (ms): 14903.4 | learning rate: 1.761E-05 | global batch size: 32 | lm loss: 6.343786E+00 | loss scale: 32768.0 | grad norm: 142714.038 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3463/ 159576 | consumed samples: 63568 | elapsed time per iteration (ms): 14638.9 | learning rate: 1.761E-05 | global batch size: 32 | lm loss: 6.644784E+00 | loss scale: 32768.0 | grad norm: 158591.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3464/ 159576 | consumed samples: 63600 | elapsed time per iteration (ms): 14613.0 | learning rate: 1.762E-05 | global batch size: 32 | lm loss: 6.625895E+00 | loss scale: 32768.0 | grad norm: 123320.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3465/ 159576 | consumed samples: 63632 | elapsed time per iteration (ms): 14585.1 | learning rate: 1.763E-05 | global batch size: 32 | lm loss: 6.575481E+00 | loss scale: 32768.0 | grad norm: 175492.554 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3466/ 159576 | consumed samples: 63664 | elapsed time per iteration (ms): 15007.9 | learning rate: 1.764E-05 | global batch size: 32 | lm loss: 6.510527E+00 | loss scale: 32768.0 | grad norm: 141462.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3467/ 159576 | consumed samples: 63696 | elapsed time per iteration (ms): 14658.4 | learning rate: 1.765E-05 | global batch size: 32 | lm loss: 6.281921E+00 | loss scale: 32768.0 | grad norm: 133404.006 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3468/ 159576 | consumed samples: 63728 | elapsed time per iteration (ms): 14580.1 | learning rate: 1.766E-05 | global batch size: 32 | lm loss: 6.438425E+00 | loss scale: 32768.0 | grad norm: 155340.501 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3469/ 159576 | consumed samples: 63760 | elapsed time per iteration (ms): 14575.6 | learning rate: 1.767E-05 | global batch size: 32 | lm loss: 6.527649E+00 | loss scale: 32768.0 | grad norm: 99587.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3470/ 159576 | consumed samples: 63792 | elapsed time per iteration (ms): 14895.6 | learning rate: 1.768E-05 | global batch size: 32 | lm loss: 6.196751E+00 | loss scale: 32768.0 | grad norm: 208702.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3471/ 159576 | consumed samples: 63824 | elapsed time per iteration (ms): 14601.7 | learning rate: 1.768E-05 | global batch size: 32 | lm loss: 6.487125E+00 | loss scale: 32768.0 | grad norm: 168900.933 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3472/ 159576 | consumed samples: 63856 | elapsed time per iteration (ms): 14566.0 | learning rate: 1.769E-05 | global batch size: 32 | lm loss: 6.509688E+00 | loss scale: 32768.0 | grad norm: 154921.949 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3473/ 159576 | consumed samples: 63888 | elapsed time per iteration (ms): 14575.1 | learning rate: 1.770E-05 | global batch size: 32 | lm loss: 6.622843E+00 | loss scale: 32768.0 | grad norm: 140472.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3474/ 159576 | consumed samples: 63920 | elapsed time per iteration (ms): 14877.5 | learning rate: 1.771E-05 | global batch size: 32 | lm loss: 6.475362E+00 | loss scale: 32768.0 | grad norm: 119718.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3475/ 159576 | consumed samples: 63952 | elapsed time per iteration (ms): 14552.0 | learning rate: 1.772E-05 | global batch size: 32 | lm loss: 6.465285E+00 | loss scale: 32768.0 | grad norm: 172671.121 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3476/ 159576 | consumed samples: 63984 | elapsed time per iteration (ms): 14582.7 | learning rate: 1.773E-05 | global batch size: 32 | lm loss: 6.389154E+00 | loss scale: 32768.0 | grad norm: 113417.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3477/ 159576 | consumed samples: 64016 | elapsed time per iteration (ms): 14606.6 | learning rate: 1.774E-05 | global batch size: 32 | lm loss: 6.582153E+00 | loss scale: 32768.0 | grad norm: 139244.123 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3478/ 159576 | consumed samples: 64048 | elapsed time per iteration (ms): 14915.2 | learning rate: 1.775E-05 | global batch size: 32 | lm loss: 6.490180E+00 | loss scale: 32768.0 | grad norm: 94281.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3479/ 159576 | consumed samples: 64080 | elapsed time per iteration (ms): 14555.1 | learning rate: 1.776E-05 | global batch size: 32 | lm loss: 6.683810E+00 | loss scale: 32768.0 | grad norm: 149137.080 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3480/ 159576 | consumed samples: 64112 | elapsed time per iteration (ms): 14553.1 | learning rate: 1.776E-05 | global batch size: 32 | lm loss: 6.534214E+00 | loss scale: 32768.0 | grad norm: 129169.136 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3481/ 159576 | consumed samples: 64144 | elapsed time per iteration (ms): 14603.3 | learning rate: 1.777E-05 | global batch size: 32 | lm loss: 6.581446E+00 | loss scale: 32768.0 | grad norm: 115991.644 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3482/ 159576 | consumed samples: 64176 | elapsed time per iteration (ms): 14916.9 | learning rate: 1.778E-05 | global batch size: 32 | lm loss: 6.567008E+00 | loss scale: 32768.0 | grad norm: 184960.532 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3483/ 159576 | consumed samples: 64208 | elapsed time per iteration (ms): 14481.2 | learning rate: 1.779E-05 | global batch size: 32 | lm loss: 6.662760E+00 | loss scale: 32768.0 | grad norm: 134077.108 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3484/ 159576 | consumed samples: 64240 | elapsed time per iteration (ms): 14567.5 | learning rate: 1.780E-05 | global batch size: 32 | lm loss: 6.589795E+00 | loss scale: 32768.0 | grad norm: 126611.070 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3485/ 159576 | consumed samples: 64272 | elapsed time per iteration (ms): 14495.3 | learning rate: 1.781E-05 | global batch size: 32 | lm loss: 6.497936E+00 | loss scale: 32768.0 | grad norm: 122115.644 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3486/ 159576 | consumed samples: 64304 | elapsed time per iteration (ms): 14568.8 | learning rate: 1.782E-05 | global batch size: 32 | lm loss: 6.558665E+00 | loss scale: 32768.0 | grad norm: 126373.837 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3487/ 159576 | consumed samples: 64336 | elapsed time per iteration (ms): 14913.4 | learning rate: 1.783E-05 | global batch size: 32 | lm loss: 6.431637E+00 | loss scale: 32768.0 | grad norm: 161636.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3488/ 159576 | consumed samples: 64368 | elapsed time per iteration (ms): 14528.7 | learning rate: 1.784E-05 | global batch size: 32 | lm loss: 6.356628E+00 | loss scale: 32768.0 | grad norm: 114700.134 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3489/ 159576 | consumed samples: 64400 | elapsed time per iteration (ms): 14522.5 | learning rate: 1.784E-05 | global batch size: 32 | lm loss: 6.470509E+00 | loss scale: 32768.0 | grad norm: 157358.888 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3490/ 159576 | consumed samples: 64432 | elapsed time per iteration (ms): 14512.2 | learning rate: 1.785E-05 | global batch size: 32 | lm loss: 6.580731E+00 | loss scale: 32768.0 | grad norm: 124839.092 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3491/ 159576 | consumed samples: 64464 | elapsed time per iteration (ms): 14760.8 | learning rate: 1.786E-05 | global batch size: 32 | lm loss: 6.545910E+00 | loss scale: 32768.0 | grad norm: 225734.887 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3492/ 159576 | consumed samples: 64496 | elapsed time per iteration (ms): 14465.1 | learning rate: 1.787E-05 | global batch size: 32 | lm loss: 6.462240E+00 | loss scale: 32768.0 | grad norm: 157153.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3493/ 159576 | consumed samples: 64528 | elapsed time per iteration (ms): 14555.7 | learning rate: 1.788E-05 | global batch size: 32 | lm loss: 6.526244E+00 | loss scale: 32768.0 | grad norm: 134834.105 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3494/ 159576 | consumed samples: 64560 | elapsed time per iteration (ms): 14523.5 | learning rate: 1.789E-05 | global batch size: 32 | lm loss: 6.464767E+00 | loss scale: 32768.0 | grad norm: 111080.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3495/ 159576 | consumed samples: 64592 | elapsed time per iteration (ms): 14680.5 | learning rate: 1.790E-05 | global batch size: 32 | lm loss: 6.498696E+00 | loss scale: 32768.0 | grad norm: 149926.493 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3496/ 159576 | consumed samples: 64624 | elapsed time per iteration (ms): 14537.6 | learning rate: 1.791E-05 | global batch size: 32 | lm loss: 6.801207E+00 | loss scale: 32768.0 | grad norm: 169978.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3497/ 159576 | consumed samples: 64656 | elapsed time per iteration (ms): 14576.8 | learning rate: 1.792E-05 | global batch size: 32 | lm loss: 6.458578E+00 | loss scale: 32768.0 | grad norm: 128624.834 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3498/ 159576 | consumed samples: 64688 | elapsed time per iteration (ms): 14451.0 | learning rate: 1.792E-05 | global batch size: 32 | lm loss: 6.562904E+00 | loss scale: 32768.0 | grad norm: 201818.910 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3499/ 159576 | consumed samples: 64720 | elapsed time per iteration (ms): 14843.4 | learning rate: 1.793E-05 | global batch size: 32 | lm loss: 6.620703E+00 | loss scale: 32768.0 | grad norm: 136369.889 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3500/ 159576 | consumed samples: 64752 | elapsed time per iteration (ms): 14591.5 | learning rate: 1.794E-05 | global batch size: 32 | lm loss: 6.545550E+00 | loss scale: 32768.0 | grad norm: 169642.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3501/ 159576 | consumed samples: 64784 | elapsed time per iteration (ms): 14557.9 | learning rate: 1.795E-05 | global batch size: 32 | lm loss: 6.401666E+00 | loss scale: 32768.0 | grad norm: 152333.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3502/ 159576 | consumed samples: 64816 | elapsed time per iteration (ms): 14554.3 | learning rate: 1.796E-05 | global batch size: 32 | lm loss: 6.776519E+00 | loss scale: 32768.0 | grad norm: 234394.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3503/ 159576 | consumed samples: 64848 | elapsed time per iteration (ms): 14868.0 | learning rate: 1.797E-05 | global batch size: 32 | lm loss: 6.465873E+00 | loss scale: 32768.0 | grad norm: 117665.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3504/ 159576 | consumed samples: 64880 | elapsed time per iteration (ms): 14552.4 | learning rate: 1.798E-05 | global batch size: 32 | lm loss: 6.534934E+00 | loss scale: 32768.0 | grad norm: 205418.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3505/ 159576 | consumed samples: 64912 | elapsed time per iteration (ms): 14532.4 | learning rate: 1.799E-05 | global batch size: 32 | lm loss: 6.777419E+00 | loss scale: 32768.0 | grad norm: 156642.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3506/ 159576 | consumed samples: 64944 | elapsed time per iteration (ms): 14549.9 | learning rate: 1.800E-05 | global batch size: 32 | lm loss: 6.528007E+00 | loss scale: 32768.0 | grad norm: 168324.988 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3507/ 159576 | consumed samples: 64976 | elapsed time per iteration (ms): 14947.6 | learning rate: 1.800E-05 | global batch size: 32 | lm loss: 6.669527E+00 | loss scale: 32768.0 | grad norm: 116164.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3508/ 159576 | consumed samples: 65008 | elapsed time per iteration (ms): 14485.1 | learning rate: 1.801E-05 | global batch size: 32 | lm loss: 6.649974E+00 | loss scale: 32768.0 | grad norm: 195968.521 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3509/ 159576 | consumed samples: 65040 | elapsed time per iteration (ms): 14549.4 | learning rate: 1.802E-05 | global batch size: 32 | lm loss: 6.636446E+00 | loss scale: 32768.0 | grad norm: 135969.732 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3510/ 159576 | consumed samples: 65072 | elapsed time per iteration (ms): 14546.9 | learning rate: 1.803E-05 | global batch size: 32 | lm loss: 6.529005E+00 | loss scale: 32768.0 | grad norm: 225903.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3511/ 159576 | consumed samples: 65104 | elapsed time per iteration (ms): 14847.8 | learning rate: 1.804E-05 | global batch size: 32 | lm loss: 6.629415E+00 | loss scale: 32768.0 | grad norm: 130652.559 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3512/ 159576 | consumed samples: 65136 | elapsed time per iteration (ms): 14520.0 | learning rate: 1.805E-05 | global batch size: 32 | lm loss: 6.599288E+00 | loss scale: 32768.0 | grad norm: 149863.059 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3513/ 159576 | consumed samples: 65168 | elapsed time per iteration (ms): 14651.1 | learning rate: 1.806E-05 | global batch size: 32 | lm loss: 6.592654E+00 | loss scale: 32768.0 | grad norm: 166996.968 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3514/ 159576 | consumed samples: 65200 | elapsed time per iteration (ms): 14479.3 | learning rate: 1.807E-05 | global batch size: 32 | lm loss: 6.540200E+00 | loss scale: 32768.0 | grad norm: 115498.690 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3515/ 159576 | consumed samples: 65232 | elapsed time per iteration (ms): 14930.0 | learning rate: 1.808E-05 | global batch size: 32 | lm loss: 6.488201E+00 | loss scale: 32768.0 | grad norm: 217689.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3516/ 159576 | consumed samples: 65264 | elapsed time per iteration (ms): 14459.8 | learning rate: 1.808E-05 | global batch size: 32 | lm loss: 6.478746E+00 | loss scale: 32768.0 | grad norm: 131460.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3517/ 159576 | consumed samples: 65296 | elapsed time per iteration (ms): 14524.9 | learning rate: 1.809E-05 | global batch size: 32 | lm loss: 6.658568E+00 | loss scale: 32768.0 | grad norm: 186540.119 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3518/ 159576 | consumed samples: 65328 | elapsed time per iteration (ms): 14525.2 | learning rate: 1.810E-05 | global batch size: 32 | lm loss: 6.641760E+00 | loss scale: 32768.0 | grad norm: 215453.929 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3519/ 159576 | consumed samples: 65360 | elapsed time per iteration (ms): 14903.9 | learning rate: 1.811E-05 | global batch size: 32 | lm loss: 6.578794E+00 | loss scale: 32768.0 | grad norm: 129785.760 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3520/ 159576 | consumed samples: 65392 | elapsed time per iteration (ms): 14710.5 | learning rate: 1.812E-05 | global batch size: 32 | lm loss: 6.623507E+00 | loss scale: 32768.0 | grad norm: 120935.963 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3521/ 159576 | consumed samples: 65424 | elapsed time per iteration (ms): 14520.7 | learning rate: 1.813E-05 | global batch size: 32 | lm loss: 6.597843E+00 | loss scale: 32768.0 | grad norm: 116244.009 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3522/ 159576 | consumed samples: 65456 | elapsed time per iteration (ms): 14597.0 | learning rate: 1.814E-05 | global batch size: 32 | lm loss: 6.504926E+00 | loss scale: 32768.0 | grad norm: 134767.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3523/ 159576 | consumed samples: 65488 | elapsed time per iteration (ms): 14942.9 | learning rate: 1.815E-05 | global batch size: 32 | lm loss: 6.435289E+00 | loss scale: 32768.0 | grad norm: 86682.164 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3524/ 159576 | consumed samples: 65520 | elapsed time per iteration (ms): 14654.2 | learning rate: 1.816E-05 | global batch size: 32 | lm loss: 6.594196E+00 | loss scale: 32768.0 | grad norm: 134027.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3525/ 159576 | consumed samples: 65552 | elapsed time per iteration (ms): 14562.7 | learning rate: 1.816E-05 | global batch size: 32 | lm loss: 6.679243E+00 | loss scale: 32768.0 | grad norm: 125221.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3526/ 159576 | consumed samples: 65584 | elapsed time per iteration (ms): 14630.7 | learning rate: 1.817E-05 | global batch size: 32 | lm loss: 6.456674E+00 | loss scale: 32768.0 | grad norm: 86112.712 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3527/ 159576 | consumed samples: 65616 | elapsed time per iteration (ms): 14493.8 | learning rate: 1.818E-05 | global batch size: 32 | lm loss: 6.600234E+00 | loss scale: 32768.0 | grad norm: 300729.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3528/ 159576 | consumed samples: 65648 | elapsed time per iteration (ms): 14813.0 | learning rate: 1.819E-05 | global batch size: 32 | lm loss: 6.399897E+00 | loss scale: 32768.0 | grad norm: 153878.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3529/ 159576 | consumed samples: 65680 | elapsed time per iteration (ms): 14593.6 | learning rate: 1.820E-05 | global batch size: 32 | lm loss: 6.540657E+00 | loss scale: 32768.0 | grad norm: 150860.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3530/ 159576 | consumed samples: 65712 | elapsed time per iteration (ms): 14559.8 | learning rate: 1.821E-05 | global batch size: 32 | lm loss: 6.503862E+00 | loss scale: 32768.0 | grad norm: 149193.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3531/ 159576 | consumed samples: 65744 | elapsed time per iteration (ms): 14581.4 | learning rate: 1.822E-05 | global batch size: 32 | lm loss: 6.692787E+00 | loss scale: 32768.0 | grad norm: 207812.798 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3532/ 159576 | consumed samples: 65776 | elapsed time per iteration (ms): 14715.5 | learning rate: 1.823E-05 | global batch size: 32 | lm loss: 6.484317E+00 | loss scale: 32768.0 | grad norm: 161092.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3533/ 159576 | consumed samples: 65808 | elapsed time per iteration (ms): 14610.9 | learning rate: 1.824E-05 | global batch size: 32 | lm loss: 6.475138E+00 | loss scale: 32768.0 | grad norm: 155421.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3534/ 159576 | consumed samples: 65840 | elapsed time per iteration (ms): 14445.3 | learning rate: 1.824E-05 | global batch size: 32 | lm loss: 6.511703E+00 | loss scale: 32768.0 | grad norm: 114681.720 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3535/ 159576 | consumed samples: 65872 | elapsed time per iteration (ms): 14477.9 | learning rate: 1.825E-05 | global batch size: 32 | lm loss: 6.509159E+00 | loss scale: 32768.0 | grad norm: 183050.824 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3536/ 159576 | consumed samples: 65904 | elapsed time per iteration (ms): 14816.2 | learning rate: 1.826E-05 | global batch size: 32 | lm loss: 6.497670E+00 | loss scale: 32768.0 | grad norm: 96091.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3537/ 159576 | consumed samples: 65936 | elapsed time per iteration (ms): 14439.5 | learning rate: 1.827E-05 | global batch size: 32 | lm loss: 6.505747E+00 | loss scale: 32768.0 | grad norm: 140156.886 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3538/ 159576 | consumed samples: 65968 | elapsed time per iteration (ms): 14594.1 | learning rate: 1.828E-05 | global batch size: 32 | lm loss: 6.516546E+00 | loss scale: 32768.0 | grad norm: 97276.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3539/ 159576 | consumed samples: 66000 | elapsed time per iteration (ms): 14531.0 | learning rate: 1.829E-05 | global batch size: 32 | lm loss: 6.589782E+00 | loss scale: 32768.0 | grad norm: 283362.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3540/ 159576 | consumed samples: 66032 | elapsed time per iteration (ms): 14766.1 | learning rate: 1.830E-05 | global batch size: 32 | lm loss: 6.457118E+00 | loss scale: 32768.0 | grad norm: 119093.566 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3541/ 159576 | consumed samples: 66064 | elapsed time per iteration (ms): 14538.8 | learning rate: 1.831E-05 | global batch size: 32 | lm loss: 6.543458E+00 | loss scale: 32768.0 | grad norm: 143270.575 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3542/ 159576 | consumed samples: 66096 | elapsed time per iteration (ms): 14503.8 | learning rate: 1.832E-05 | global batch size: 32 | lm loss: 6.549830E+00 | loss scale: 32768.0 | grad norm: 146934.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3543/ 159576 | consumed samples: 66128 | elapsed time per iteration (ms): 14525.1 | learning rate: 1.832E-05 | global batch size: 32 | lm loss: 6.523373E+00 | loss scale: 32768.0 | grad norm: 246079.782 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3544/ 159576 | consumed samples: 66160 | elapsed time per iteration (ms): 14836.5 | learning rate: 1.833E-05 | global batch size: 32 | lm loss: 6.484323E+00 | loss scale: 32768.0 | grad norm: 150473.482 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3545/ 159576 | consumed samples: 66192 | elapsed time per iteration (ms): 14612.1 | learning rate: 1.834E-05 | global batch size: 32 | lm loss: 6.596731E+00 | loss scale: 32768.0 | grad norm: 157995.993 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3546/ 159576 | consumed samples: 66224 | elapsed time per iteration (ms): 14518.2 | learning rate: 1.835E-05 | global batch size: 32 | lm loss: 6.564546E+00 | loss scale: 32768.0 | grad norm: 164874.723 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3547/ 159576 | consumed samples: 66256 | elapsed time per iteration (ms): 14501.0 | learning rate: 1.836E-05 | global batch size: 32 | lm loss: 6.427078E+00 | loss scale: 32768.0 | grad norm: 175876.651 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3548/ 159576 | consumed samples: 66288 | elapsed time per iteration (ms): 14899.9 | learning rate: 1.837E-05 | global batch size: 32 | lm loss: 6.488606E+00 | loss scale: 32768.0 | grad norm: 198886.829 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3549/ 159576 | consumed samples: 66320 | elapsed time per iteration (ms): 14520.6 | learning rate: 1.838E-05 | global batch size: 32 | lm loss: 6.462682E+00 | loss scale: 32768.0 | grad norm: 127675.702 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3550/ 159576 | consumed samples: 66352 | elapsed time per iteration (ms): 14447.8 | learning rate: 1.839E-05 | global batch size: 32 | lm loss: 6.652044E+00 | loss scale: 32768.0 | grad norm: 140944.667 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3551/ 159576 | consumed samples: 66384 | elapsed time per iteration (ms): 14467.2 | learning rate: 1.839E-05 | global batch size: 32 | lm loss: 6.520955E+00 | loss scale: 32768.0 | grad norm: 86094.102 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3552/ 159576 | consumed samples: 66416 | elapsed time per iteration (ms): 14808.2 | learning rate: 1.840E-05 | global batch size: 32 | lm loss: 6.429432E+00 | loss scale: 32768.0 | grad norm: 116647.112 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3553/ 159576 | consumed samples: 66448 | elapsed time per iteration (ms): 14503.5 | learning rate: 1.841E-05 | global batch size: 32 | lm loss: 6.463936E+00 | loss scale: 32768.0 | grad norm: 118564.730 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3554/ 159576 | consumed samples: 66480 | elapsed time per iteration (ms): 14502.1 | learning rate: 1.842E-05 | global batch size: 32 | lm loss: 6.458220E+00 | loss scale: 32768.0 | grad norm: 112013.908 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3555/ 159576 | consumed samples: 66512 | elapsed time per iteration (ms): 14486.2 | learning rate: 1.843E-05 | global batch size: 32 | lm loss: 6.492205E+00 | loss scale: 32768.0 | grad norm: 95075.794 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3556/ 159576 | consumed samples: 66544 | elapsed time per iteration (ms): 14873.1 | learning rate: 1.844E-05 | global batch size: 32 | lm loss: 6.582590E+00 | loss scale: 32768.0 | grad norm: 160024.973 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3557/ 159576 | consumed samples: 66576 | elapsed time per iteration (ms): 14487.7 | learning rate: 1.845E-05 | global batch size: 32 | lm loss: 6.504139E+00 | loss scale: 32768.0 | grad norm: 102536.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3558/ 159576 | consumed samples: 66608 | elapsed time per iteration (ms): 14571.2 | learning rate: 1.846E-05 | global batch size: 32 | lm loss: 6.514203E+00 | loss scale: 32768.0 | grad norm: 221229.679 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3559/ 159576 | consumed samples: 66640 | elapsed time per iteration (ms): 14451.0 | learning rate: 1.847E-05 | global batch size: 32 | lm loss: 6.560319E+00 | loss scale: 32768.0 | grad norm: 131012.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3560/ 159576 | consumed samples: 66672 | elapsed time per iteration (ms): 14938.1 | learning rate: 1.847E-05 | global batch size: 32 | lm loss: 6.372297E+00 | loss scale: 32768.0 | grad norm: 139056.836 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3561/ 159576 | consumed samples: 66704 | elapsed time per iteration (ms): 14523.1 | learning rate: 1.848E-05 | global batch size: 32 | lm loss: 6.416655E+00 | loss scale: 32768.0 | grad norm: 147497.179 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3562/ 159576 | consumed samples: 66736 | elapsed time per iteration (ms): 14487.9 | learning rate: 1.849E-05 | global batch size: 32 | lm loss: 6.474949E+00 | loss scale: 32768.0 | grad norm: 174437.813 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3563/ 159576 | consumed samples: 66768 | elapsed time per iteration (ms): 14468.9 | learning rate: 1.850E-05 | global batch size: 32 | lm loss: 6.623423E+00 | loss scale: 32768.0 | grad norm: 122791.597 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3564/ 159576 | consumed samples: 66800 | elapsed time per iteration (ms): 14508.1 | learning rate: 1.851E-05 | global batch size: 32 | lm loss: 6.516719E+00 | loss scale: 32768.0 | grad norm: 125896.178 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3565/ 159576 | consumed samples: 66832 | elapsed time per iteration (ms): 14821.3 | learning rate: 1.852E-05 | global batch size: 32 | lm loss: 6.567136E+00 | loss scale: 32768.0 | grad norm: 156146.827 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3566/ 159576 | consumed samples: 66864 | elapsed time per iteration (ms): 14550.7 | learning rate: 1.853E-05 | global batch size: 32 | lm loss: 6.464426E+00 | loss scale: 32768.0 | grad norm: 112089.852 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3567/ 159576 | consumed samples: 66896 | elapsed time per iteration (ms): 14483.3 | learning rate: 1.854E-05 | global batch size: 32 | lm loss: 6.330031E+00 | loss scale: 32768.0 | grad norm: 100672.150 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3568/ 159576 | consumed samples: 66928 | elapsed time per iteration (ms): 14573.3 | learning rate: 1.855E-05 | global batch size: 32 | lm loss: 6.472744E+00 | loss scale: 32768.0 | grad norm: 206164.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3569/ 159576 | consumed samples: 66960 | elapsed time per iteration (ms): 14778.2 | learning rate: 1.855E-05 | global batch size: 32 | lm loss: 6.502261E+00 | loss scale: 32768.0 | grad norm: 117741.940 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3570/ 159576 | consumed samples: 66992 | elapsed time per iteration (ms): 14563.8 | learning rate: 1.856E-05 | global batch size: 32 | lm loss: 6.480472E+00 | loss scale: 32768.0 | grad norm: 180667.970 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3571/ 159576 | consumed samples: 67024 | elapsed time per iteration (ms): 14517.4 | learning rate: 1.857E-05 | global batch size: 32 | lm loss: 6.653479E+00 | loss scale: 32768.0 | grad norm: 121625.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3572/ 159576 | consumed samples: 67056 | elapsed time per iteration (ms): 14532.0 | learning rate: 1.858E-05 | global batch size: 32 | lm loss: 6.478413E+00 | loss scale: 32768.0 | grad norm: 135823.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3573/ 159576 | consumed samples: 67088 | elapsed time per iteration (ms): 14807.4 | learning rate: 1.859E-05 | global batch size: 32 | lm loss: 6.589501E+00 | loss scale: 32768.0 | grad norm: 147763.903 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3574/ 159576 | consumed samples: 67120 | elapsed time per iteration (ms): 14483.4 | learning rate: 1.860E-05 | global batch size: 32 | lm loss: 6.503617E+00 | loss scale: 32768.0 | grad norm: 85865.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3575/ 159576 | consumed samples: 67152 | elapsed time per iteration (ms): 14505.6 | learning rate: 1.861E-05 | global batch size: 32 | lm loss: 6.573061E+00 | loss scale: 32768.0 | grad norm: 180050.879 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3576/ 159576 | consumed samples: 67184 | elapsed time per iteration (ms): 14550.9 | learning rate: 1.862E-05 | global batch size: 32 | lm loss: 6.480776E+00 | loss scale: 32768.0 | grad norm: 122066.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3577/ 159576 | consumed samples: 67216 | elapsed time per iteration (ms): 14868.6 | learning rate: 1.863E-05 | global batch size: 32 | lm loss: 6.625753E+00 | loss scale: 32768.0 | grad norm: 166062.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3578/ 159576 | consumed samples: 67248 | elapsed time per iteration (ms): 14594.8 | learning rate: 1.863E-05 | global batch size: 32 | lm loss: 6.470201E+00 | loss scale: 32768.0 | grad norm: 158898.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 16:06:53] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 16:06:53] PULSE: tr8-104B is running for 10:14:42 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 3579/ 159576 | consumed samples: 67280 | elapsed time per iteration (ms): 14505.5 | learning rate: 1.864E-05 | global batch size: 32 | lm loss: 6.669123E+00 | loss scale: 32768.0 | grad norm: 114371.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3580/ 159576 | consumed samples: 67312 | elapsed time per iteration (ms): 14435.4 | learning rate: 1.865E-05 | global batch size: 32 | lm loss: 6.504656E+00 | loss scale: 32768.0 | grad norm: 143322.183 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3581/ 159576 | consumed samples: 67344 | elapsed time per iteration (ms): 14983.8 | learning rate: 1.866E-05 | global batch size: 32 | lm loss: 6.634960E+00 | loss scale: 32768.0 | grad norm: 124051.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3582/ 159576 | consumed samples: 67376 | elapsed time per iteration (ms): 14518.7 | learning rate: 1.867E-05 | global batch size: 32 | lm loss: 6.488723E+00 | loss scale: 32768.0 | grad norm: 108661.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3583/ 159576 | consumed samples: 67408 | elapsed time per iteration (ms): 14495.4 | learning rate: 1.868E-05 | global batch size: 32 | lm loss: 6.397575E+00 | loss scale: 32768.0 | grad norm: 156428.484 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3584/ 159576 | consumed samples: 67440 | elapsed time per iteration (ms): 14500.4 | learning rate: 1.869E-05 | global batch size: 32 | lm loss: 6.505555E+00 | loss scale: 32768.0 | grad norm: 158735.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3585/ 159576 | consumed samples: 67472 | elapsed time per iteration (ms): 14850.8 | learning rate: 1.870E-05 | global batch size: 32 | lm loss: 6.384704E+00 | loss scale: 32768.0 | grad norm: 121455.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3586/ 159576 | consumed samples: 67504 | elapsed time per iteration (ms): 14516.1 | learning rate: 1.871E-05 | global batch size: 32 | lm loss: 6.391223E+00 | loss scale: 32768.0 | grad norm: 200272.961 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3587/ 159576 | consumed samples: 67536 | elapsed time per iteration (ms): 14478.9 | learning rate: 1.871E-05 | global batch size: 32 | lm loss: 6.602296E+00 | loss scale: 32768.0 | grad norm: 156857.138 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3588/ 159576 | consumed samples: 67568 | elapsed time per iteration (ms): 14457.3 | learning rate: 1.872E-05 | global batch size: 32 | lm loss: 6.356599E+00 | loss scale: 32768.0 | grad norm: 132240.106 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3589/ 159576 | consumed samples: 67600 | elapsed time per iteration (ms): 14840.9 | learning rate: 1.873E-05 | global batch size: 32 | lm loss: 6.517581E+00 | loss scale: 32768.0 | grad norm: 101976.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3590/ 159576 | consumed samples: 67632 | elapsed time per iteration (ms): 14478.5 | learning rate: 1.874E-05 | global batch size: 32 | lm loss: 6.495076E+00 | loss scale: 32768.0 | grad norm: 145637.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3591/ 159576 | consumed samples: 67664 | elapsed time per iteration (ms): 14537.3 | learning rate: 1.875E-05 | global batch size: 32 | lm loss: 6.486649E+00 | loss scale: 32768.0 | grad norm: 110128.136 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3592/ 159576 | consumed samples: 67696 | elapsed time per iteration (ms): 14585.1 | learning rate: 1.876E-05 | global batch size: 32 | lm loss: 6.484485E+00 | loss scale: 32768.0 | grad norm: 93123.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3593/ 159576 | consumed samples: 67728 | elapsed time per iteration (ms): 14970.8 | learning rate: 1.877E-05 | global batch size: 32 | lm loss: 6.605970E+00 | loss scale: 32768.0 | grad norm: 196733.888 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3594/ 159576 | consumed samples: 67760 | elapsed time per iteration (ms): 14488.2 | learning rate: 1.878E-05 | global batch size: 32 | lm loss: 6.408032E+00 | loss scale: 32768.0 | grad norm: 119062.835 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3595/ 159576 | consumed samples: 67792 | elapsed time per iteration (ms): 14589.0 | learning rate: 1.879E-05 | global batch size: 32 | lm loss: 6.434669E+00 | loss scale: 32768.0 | grad norm: 163713.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3596/ 159576 | consumed samples: 67824 | elapsed time per iteration (ms): 14467.1 | learning rate: 1.879E-05 | global batch size: 32 | lm loss: 6.515763E+00 | loss scale: 32768.0 | grad norm: 123609.059 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3597/ 159576 | consumed samples: 67856 | elapsed time per iteration (ms): 14918.0 | learning rate: 1.880E-05 | global batch size: 32 | lm loss: 6.473671E+00 | loss scale: 32768.0 | grad norm: 113241.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3598/ 159576 | consumed samples: 67888 | elapsed time per iteration (ms): 14630.3 | learning rate: 1.881E-05 | global batch size: 32 | lm loss: 6.497471E+00 | loss scale: 32768.0 | grad norm: 180550.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3599/ 159576 | consumed samples: 67920 | elapsed time per iteration (ms): 14523.9 | learning rate: 1.882E-05 | global batch size: 32 | lm loss: 6.665214E+00 | loss scale: 32768.0 | grad norm: 120833.867 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3600/ 159576 | consumed samples: 67952 | elapsed time per iteration (ms): 14548.6 | learning rate: 1.883E-05 | global batch size: 32 | lm loss: 6.506467E+00 | loss scale: 32768.0 | grad norm: 124134.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3601/ 159576 | consumed samples: 67984 | elapsed time per iteration (ms): 14576.2 | learning rate: 1.884E-05 | global batch size: 32 | lm loss: 6.491764E+00 | loss scale: 32768.0 | grad norm: 230059.443 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3602/ 159576 | consumed samples: 68016 | elapsed time per iteration (ms): 14979.8 | learning rate: 1.885E-05 | global batch size: 32 | lm loss: 6.445697E+00 | loss scale: 32768.0 | grad norm: 125622.628 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3603/ 159576 | consumed samples: 68048 | elapsed time per iteration (ms): 14453.6 | learning rate: 1.886E-05 | global batch size: 32 | lm loss: 6.613330E+00 | loss scale: 32768.0 | grad norm: 166344.814 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3604/ 159576 | consumed samples: 68080 | elapsed time per iteration (ms): 14495.4 | learning rate: 1.887E-05 | global batch size: 32 | lm loss: 6.603212E+00 | loss scale: 32768.0 | grad norm: 93757.784 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3605/ 159576 | consumed samples: 68112 | elapsed time per iteration (ms): 14542.0 | learning rate: 1.887E-05 | global batch size: 32 | lm loss: 6.342390E+00 | loss scale: 32768.0 | grad norm: 130006.029 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3606/ 159576 | consumed samples: 68144 | elapsed time per iteration (ms): 14685.4 | learning rate: 1.888E-05 | global batch size: 32 | lm loss: 6.480408E+00 | loss scale: 32768.0 | grad norm: 106365.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3607/ 159576 | consumed samples: 68176 | elapsed time per iteration (ms): 14517.9 | learning rate: 1.889E-05 | global batch size: 32 | lm loss: 6.591272E+00 | loss scale: 32768.0 | grad norm: 171235.897 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3608/ 159576 | consumed samples: 68208 | elapsed time per iteration (ms): 14591.0 | learning rate: 1.890E-05 | global batch size: 32 | lm loss: 6.311239E+00 | loss scale: 32768.0 | grad norm: 126858.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3609/ 159576 | consumed samples: 68240 | elapsed time per iteration (ms): 14549.9 | learning rate: 1.891E-05 | global batch size: 32 | lm loss: 6.395494E+00 | loss scale: 32768.0 | grad norm: 227345.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3610/ 159576 | consumed samples: 68272 | elapsed time per iteration (ms): 14677.9 | learning rate: 1.892E-05 | global batch size: 32 | lm loss: 6.557859E+00 | loss scale: 32768.0 | grad norm: 116386.145 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3611/ 159576 | consumed samples: 68304 | elapsed time per iteration (ms): 14497.7 | learning rate: 1.893E-05 | global batch size: 32 | lm loss: 6.436782E+00 | loss scale: 32768.0 | grad norm: 130216.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3612/ 159576 | consumed samples: 68336 | elapsed time per iteration (ms): 14516.9 | learning rate: 1.894E-05 | global batch size: 32 | lm loss: 6.523721E+00 | loss scale: 32768.0 | grad norm: 153807.816 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3613/ 159576 | consumed samples: 68368 | elapsed time per iteration (ms): 14537.1 | learning rate: 1.895E-05 | global batch size: 32 | lm loss: 6.480092E+00 | loss scale: 32768.0 | grad norm: 191977.060 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3614/ 159576 | consumed samples: 68400 | elapsed time per iteration (ms): 14777.4 | learning rate: 1.895E-05 | global batch size: 32 | lm loss: 6.507137E+00 | loss scale: 32768.0 | grad norm: 147123.785 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3615/ 159576 | consumed samples: 68432 | elapsed time per iteration (ms): 14631.8 | learning rate: 1.896E-05 | global batch size: 32 | lm loss: 6.413469E+00 | loss scale: 32768.0 | grad norm: 151298.616 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3616/ 159576 | consumed samples: 68464 | elapsed time per iteration (ms): 14498.7 | learning rate: 1.897E-05 | global batch size: 32 | lm loss: 6.400654E+00 | loss scale: 32768.0 | grad norm: 144773.834 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3617/ 159576 | consumed samples: 68496 | elapsed time per iteration (ms): 14516.2 | learning rate: 1.898E-05 | global batch size: 32 | lm loss: 6.514056E+00 | loss scale: 32768.0 | grad norm: 212184.973 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3618/ 159576 | consumed samples: 68528 | elapsed time per iteration (ms): 15120.1 | learning rate: 1.899E-05 | global batch size: 32 | lm loss: 6.476982E+00 | loss scale: 32768.0 | grad norm: 138389.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3619/ 159576 | consumed samples: 68560 | elapsed time per iteration (ms): 14520.5 | learning rate: 1.900E-05 | global batch size: 32 | lm loss: 6.413394E+00 | loss scale: 32768.0 | grad norm: 144757.897 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3620/ 159576 | consumed samples: 68592 | elapsed time per iteration (ms): 14501.8 | learning rate: 1.901E-05 | global batch size: 32 | lm loss: 6.508588E+00 | loss scale: 32768.0 | grad norm: 119480.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3621/ 159576 | consumed samples: 68624 | elapsed time per iteration (ms): 14544.3 | learning rate: 1.902E-05 | global batch size: 32 | lm loss: 6.462088E+00 | loss scale: 32768.0 | grad norm: 118576.762 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3622/ 159576 | consumed samples: 68656 | elapsed time per iteration (ms): 14904.8 | learning rate: 1.903E-05 | global batch size: 32 | lm loss: 6.518481E+00 | loss scale: 32768.0 | grad norm: 166384.993 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3623/ 159576 | consumed samples: 68688 | elapsed time per iteration (ms): 14536.7 | learning rate: 1.903E-05 | global batch size: 32 | lm loss: 6.418991E+00 | loss scale: 32768.0 | grad norm: 133937.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3624/ 159576 | consumed samples: 68720 | elapsed time per iteration (ms): 14549.8 | learning rate: 1.904E-05 | global batch size: 32 | lm loss: 6.446878E+00 | loss scale: 32768.0 | grad norm: 270206.058 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3625/ 159576 | consumed samples: 68752 | elapsed time per iteration (ms): 14599.2 | learning rate: 1.905E-05 | global batch size: 32 | lm loss: 6.534576E+00 | loss scale: 32768.0 | grad norm: 155344.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3626/ 159576 | consumed samples: 68784 | elapsed time per iteration (ms): 14722.9 | learning rate: 1.906E-05 | global batch size: 32 | lm loss: 6.630429E+00 | loss scale: 32768.0 | grad norm: 199114.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3627/ 159576 | consumed samples: 68816 | elapsed time per iteration (ms): 14500.1 | learning rate: 1.907E-05 | global batch size: 32 | lm loss: 6.356173E+00 | loss scale: 32768.0 | grad norm: 167282.135 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3628/ 159576 | consumed samples: 68848 | elapsed time per iteration (ms): 14530.4 | learning rate: 1.908E-05 | global batch size: 32 | lm loss: 6.471046E+00 | loss scale: 32768.0 | grad norm: 208481.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3629/ 159576 | consumed samples: 68880 | elapsed time per iteration (ms): 14549.1 | learning rate: 1.909E-05 | global batch size: 32 | lm loss: 6.412348E+00 | loss scale: 32768.0 | grad norm: 149105.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3630/ 159576 | consumed samples: 68912 | elapsed time per iteration (ms): 14882.4 | learning rate: 1.910E-05 | global batch size: 32 | lm loss: 6.520298E+00 | loss scale: 32768.0 | grad norm: 123369.844 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3631/ 159576 | consumed samples: 68944 | elapsed time per iteration (ms): 14575.6 | learning rate: 1.911E-05 | global batch size: 32 | lm loss: 6.558264E+00 | loss scale: 32768.0 | grad norm: 243133.943 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3632/ 159576 | consumed samples: 68976 | elapsed time per iteration (ms): 14516.5 | learning rate: 1.911E-05 | global batch size: 32 | lm loss: 6.583918E+00 | loss scale: 32768.0 | grad norm: 178142.765 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3633/ 159576 | consumed samples: 69008 | elapsed time per iteration (ms): 14471.4 | learning rate: 1.912E-05 | global batch size: 32 | lm loss: 6.540310E+00 | loss scale: 32768.0 | grad norm: 189782.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3634/ 159576 | consumed samples: 69040 | elapsed time per iteration (ms): 14945.9 | learning rate: 1.913E-05 | global batch size: 32 | lm loss: 6.505736E+00 | loss scale: 32768.0 | grad norm: 165872.968 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3635/ 159576 | consumed samples: 69072 | elapsed time per iteration (ms): 14539.5 | learning rate: 1.914E-05 | global batch size: 32 | lm loss: 6.509236E+00 | loss scale: 32768.0 | grad norm: 245470.953 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3636/ 159576 | consumed samples: 69104 | elapsed time per iteration (ms): 14545.2 | learning rate: 1.915E-05 | global batch size: 32 | lm loss: 6.504992E+00 | loss scale: 32768.0 | grad norm: 150104.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3637/ 159576 | consumed samples: 69136 | elapsed time per iteration (ms): 14567.6 | learning rate: 1.916E-05 | global batch size: 32 | lm loss: 6.406890E+00 | loss scale: 32768.0 | grad norm: 135913.146 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3638/ 159576 | consumed samples: 69168 | elapsed time per iteration (ms): 14896.3 | learning rate: 1.917E-05 | global batch size: 32 | lm loss: 6.443694E+00 | loss scale: 32768.0 | grad norm: 185702.085 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3639/ 159576 | consumed samples: 69200 | elapsed time per iteration (ms): 14591.0 | learning rate: 1.918E-05 | global batch size: 32 | lm loss: 6.556330E+00 | loss scale: 32768.0 | grad norm: 244123.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3640/ 159576 | consumed samples: 69232 | elapsed time per iteration (ms): 14549.7 | learning rate: 1.918E-05 | global batch size: 32 | lm loss: 6.487778E+00 | loss scale: 32768.0 | grad norm: 177114.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3641/ 159576 | consumed samples: 69264 | elapsed time per iteration (ms): 14570.7 | learning rate: 1.919E-05 | global batch size: 32 | lm loss: 6.513255E+00 | loss scale: 32768.0 | grad norm: 131694.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3642/ 159576 | consumed samples: 69296 | elapsed time per iteration (ms): 14516.4 | learning rate: 1.920E-05 | global batch size: 32 | lm loss: 6.592026E+00 | loss scale: 32768.0 | grad norm: 290876.521 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3643/ 159576 | consumed samples: 69328 | elapsed time per iteration (ms): 14756.7 | learning rate: 1.921E-05 | global batch size: 32 | lm loss: 6.662066E+00 | loss scale: 32768.0 | grad norm: 228974.687 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3644/ 159576 | consumed samples: 69360 | elapsed time per iteration (ms): 14551.2 | learning rate: 1.922E-05 | global batch size: 32 | lm loss: 6.366663E+00 | loss scale: 32768.0 | grad norm: 161091.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3645/ 159576 | consumed samples: 69392 | elapsed time per iteration (ms): 14619.9 | learning rate: 1.923E-05 | global batch size: 32 | lm loss: 6.523453E+00 | loss scale: 32768.0 | grad norm: 136622.848 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3646/ 159576 | consumed samples: 69424 | elapsed time per iteration (ms): 14549.7 | learning rate: 1.924E-05 | global batch size: 32 | lm loss: 6.502388E+00 | loss scale: 32768.0 | grad norm: 233041.164 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3647/ 159576 | consumed samples: 69456 | elapsed time per iteration (ms): 14639.6 | learning rate: 1.925E-05 | global batch size: 32 | lm loss: 6.570889E+00 | loss scale: 32768.0 | grad norm: 177700.635 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3648/ 159576 | consumed samples: 69488 | elapsed time per iteration (ms): 14511.4 | learning rate: 1.926E-05 | global batch size: 32 | lm loss: 6.538668E+00 | loss scale: 32768.0 | grad norm: 167613.706 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3649/ 159576 | consumed samples: 69520 | elapsed time per iteration (ms): 14499.6 | learning rate: 1.926E-05 | global batch size: 32 | lm loss: 6.650812E+00 | loss scale: 32768.0 | grad norm: 144019.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3650/ 159576 | consumed samples: 69552 | elapsed time per iteration (ms): 14509.6 | learning rate: 1.927E-05 | global batch size: 32 | lm loss: 6.449777E+00 | loss scale: 32768.0 | grad norm: 190635.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3651/ 159576 | consumed samples: 69584 | elapsed time per iteration (ms): 14775.5 | learning rate: 1.928E-05 | global batch size: 32 | lm loss: 6.435673E+00 | loss scale: 32768.0 | grad norm: 181537.989 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3652/ 159576 | consumed samples: 69616 | elapsed time per iteration (ms): 14563.5 | learning rate: 1.929E-05 | global batch size: 32 | lm loss: 6.631623E+00 | loss scale: 32768.0 | grad norm: 150202.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3653/ 159576 | consumed samples: 69648 | elapsed time per iteration (ms): 14524.8 | learning rate: 1.930E-05 | global batch size: 32 | lm loss: 6.612866E+00 | loss scale: 32768.0 | grad norm: 136863.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3654/ 159576 | consumed samples: 69680 | elapsed time per iteration (ms): 14611.3 | learning rate: 1.931E-05 | global batch size: 32 | lm loss: 6.471664E+00 | loss scale: 32768.0 | grad norm: 177103.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3655/ 159576 | consumed samples: 69712 | elapsed time per iteration (ms): 14752.9 | learning rate: 1.932E-05 | global batch size: 32 | lm loss: 6.436707E+00 | loss scale: 32768.0 | grad norm: 107210.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3656/ 159576 | consumed samples: 69744 | elapsed time per iteration (ms): 14544.1 | learning rate: 1.933E-05 | global batch size: 32 | lm loss: 6.679466E+00 | loss scale: 32768.0 | grad norm: 156389.742 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3657/ 159576 | consumed samples: 69776 | elapsed time per iteration (ms): 14560.9 | learning rate: 1.934E-05 | global batch size: 32 | lm loss: 6.478530E+00 | loss scale: 32768.0 | grad norm: 136151.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3658/ 159576 | consumed samples: 69808 | elapsed time per iteration (ms): 14516.8 | learning rate: 1.934E-05 | global batch size: 32 | lm loss: 6.537941E+00 | loss scale: 32768.0 | grad norm: 169825.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3659/ 159576 | consumed samples: 69840 | elapsed time per iteration (ms): 15041.8 | learning rate: 1.935E-05 | global batch size: 32 | lm loss: 6.414840E+00 | loss scale: 32768.0 | grad norm: 116305.156 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3660/ 159576 | consumed samples: 69872 | elapsed time per iteration (ms): 14596.0 | learning rate: 1.936E-05 | global batch size: 32 | lm loss: 6.423607E+00 | loss scale: 32768.0 | grad norm: 157726.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3661/ 159576 | consumed samples: 69904 | elapsed time per iteration (ms): 14600.4 | learning rate: 1.937E-05 | global batch size: 32 | lm loss: 6.516055E+00 | loss scale: 32768.0 | grad norm: 150170.125 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3662/ 159576 | consumed samples: 69936 | elapsed time per iteration (ms): 14508.1 | learning rate: 1.938E-05 | global batch size: 32 | lm loss: 6.406610E+00 | loss scale: 32768.0 | grad norm: 180125.834 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3663/ 159576 | consumed samples: 69968 | elapsed time per iteration (ms): 14795.2 | learning rate: 1.939E-05 | global batch size: 32 | lm loss: 6.495340E+00 | loss scale: 32768.0 | grad norm: 156226.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3664/ 159576 | consumed samples: 70000 | elapsed time per iteration (ms): 14502.7 | learning rate: 1.940E-05 | global batch size: 32 | lm loss: 6.478324E+00 | loss scale: 32768.0 | grad norm: 139199.774 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3665/ 159576 | consumed samples: 70032 | elapsed time per iteration (ms): 14521.4 | learning rate: 1.941E-05 | global batch size: 32 | lm loss: 6.486080E+00 | loss scale: 32768.0 | grad norm: 139987.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3666/ 159576 | consumed samples: 70064 | elapsed time per iteration (ms): 14501.0 | learning rate: 1.942E-05 | global batch size: 32 | lm loss: 6.412463E+00 | loss scale: 32768.0 | grad norm: 187000.562 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3667/ 159576 | consumed samples: 70096 | elapsed time per iteration (ms): 14907.7 | learning rate: 1.942E-05 | global batch size: 32 | lm loss: 6.555160E+00 | loss scale: 32768.0 | grad norm: 151236.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3668/ 159576 | consumed samples: 70128 | elapsed time per iteration (ms): 14546.0 | learning rate: 1.943E-05 | global batch size: 32 | lm loss: 6.466833E+00 | loss scale: 32768.0 | grad norm: 188341.809 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3669/ 159576 | consumed samples: 70160 | elapsed time per iteration (ms): 14504.0 | learning rate: 1.944E-05 | global batch size: 32 | lm loss: 6.512917E+00 | loss scale: 32768.0 | grad norm: 142898.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3670/ 159576 | consumed samples: 70192 | elapsed time per iteration (ms): 14550.7 | learning rate: 1.945E-05 | global batch size: 32 | lm loss: 6.662933E+00 | loss scale: 32768.0 | grad norm: 155470.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3671/ 159576 | consumed samples: 70224 | elapsed time per iteration (ms): 14892.4 | learning rate: 1.946E-05 | global batch size: 32 | lm loss: 6.373161E+00 | loss scale: 32768.0 | grad norm: 150042.585 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3672/ 159576 | consumed samples: 70256 | elapsed time per iteration (ms): 14566.7 | learning rate: 1.947E-05 | global batch size: 32 | lm loss: 6.426474E+00 | loss scale: 32768.0 | grad norm: 170805.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3673/ 159576 | consumed samples: 70288 | elapsed time per iteration (ms): 14501.7 | learning rate: 1.948E-05 | global batch size: 32 | lm loss: 6.370544E+00 | loss scale: 32768.0 | grad norm: 138493.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3674/ 159576 | consumed samples: 70320 | elapsed time per iteration (ms): 14600.9 | learning rate: 1.949E-05 | global batch size: 32 | lm loss: 6.383911E+00 | loss scale: 32768.0 | grad norm: 137200.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3675/ 159576 | consumed samples: 70352 | elapsed time per iteration (ms): 14904.3 | learning rate: 1.950E-05 | global batch size: 32 | lm loss: 6.430146E+00 | loss scale: 32768.0 | grad norm: 130856.844 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3676/ 159576 | consumed samples: 70384 | elapsed time per iteration (ms): 14544.1 | learning rate: 1.950E-05 | global batch size: 32 | lm loss: 6.359234E+00 | loss scale: 32768.0 | grad norm: 123290.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3677/ 159576 | consumed samples: 70416 | elapsed time per iteration (ms): 14660.6 | learning rate: 1.951E-05 | global batch size: 32 | lm loss: 6.340640E+00 | loss scale: 32768.0 | grad norm: 128445.878 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3678/ 159576 | consumed samples: 70448 | elapsed time per iteration (ms): 14469.4 | learning rate: 1.952E-05 | global batch size: 32 | lm loss: 6.467716E+00 | loss scale: 32768.0 | grad norm: 222732.002 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3679/ 159576 | consumed samples: 70480 | elapsed time per iteration (ms): 14540.6 | learning rate: 1.953E-05 | global batch size: 32 | lm loss: 6.401999E+00 | loss scale: 32768.0 | grad norm: 143732.695 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3680/ 159576 | consumed samples: 70512 | elapsed time per iteration (ms): 14837.8 | learning rate: 1.954E-05 | global batch size: 32 | lm loss: 6.469200E+00 | loss scale: 32768.0 | grad norm: 148617.864 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3681/ 159576 | consumed samples: 70544 | elapsed time per iteration (ms): 14560.6 | learning rate: 1.955E-05 | global batch size: 32 | lm loss: 6.503996E+00 | loss scale: 32768.0 | grad norm: 151584.667 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3682/ 159576 | consumed samples: 70576 | elapsed time per iteration (ms): 14533.4 | learning rate: 1.956E-05 | global batch size: 32 | lm loss: 6.473675E+00 | loss scale: 32768.0 | grad norm: 171148.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3683/ 159576 | consumed samples: 70608 | elapsed time per iteration (ms): 14606.7 | learning rate: 1.957E-05 | global batch size: 32 | lm loss: 6.406356E+00 | loss scale: 32768.0 | grad norm: 139281.574 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3684/ 159576 | consumed samples: 70640 | elapsed time per iteration (ms): 14772.8 | learning rate: 1.958E-05 | global batch size: 32 | lm loss: 6.329139E+00 | loss scale: 32768.0 | grad norm: 108055.730 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3685/ 159576 | consumed samples: 70672 | elapsed time per iteration (ms): 14518.6 | learning rate: 1.958E-05 | global batch size: 32 | lm loss: 6.525671E+00 | loss scale: 32768.0 | grad norm: 204684.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3686/ 159576 | consumed samples: 70704 | elapsed time per iteration (ms): 14569.3 | learning rate: 1.959E-05 | global batch size: 32 | lm loss: 6.454522E+00 | loss scale: 32768.0 | grad norm: 108450.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3687/ 159576 | consumed samples: 70736 | elapsed time per iteration (ms): 14527.9 | learning rate: 1.960E-05 | global batch size: 32 | lm loss: 6.452621E+00 | loss scale: 32768.0 | grad norm: 154981.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3688/ 159576 | consumed samples: 70768 | elapsed time per iteration (ms): 14681.9 | learning rate: 1.961E-05 | global batch size: 32 | lm loss: 6.485929E+00 | loss scale: 32768.0 | grad norm: 132389.054 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3689/ 159576 | consumed samples: 70800 | elapsed time per iteration (ms): 14628.9 | learning rate: 1.962E-05 | global batch size: 32 | lm loss: 6.560607E+00 | loss scale: 32768.0 | grad norm: 244618.808 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3690/ 159576 | consumed samples: 70832 | elapsed time per iteration (ms): 14570.6 | learning rate: 1.963E-05 | global batch size: 32 | lm loss: 6.545405E+00 | loss scale: 32768.0 | grad norm: 207471.493 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3691/ 159576 | consumed samples: 70864 | elapsed time per iteration (ms): 14568.4 | learning rate: 1.964E-05 | global batch size: 32 | lm loss: 6.403141E+00 | loss scale: 32768.0 | grad norm: 160751.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3692/ 159576 | consumed samples: 70896 | elapsed time per iteration (ms): 14828.9 | learning rate: 1.965E-05 | global batch size: 32 | lm loss: 6.494320E+00 | loss scale: 32768.0 | grad norm: 142715.734 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3693/ 159576 | consumed samples: 70928 | elapsed time per iteration (ms): 14576.4 | learning rate: 1.966E-05 | global batch size: 32 | lm loss: 6.317194E+00 | loss scale: 32768.0 | grad norm: 218725.519 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3694/ 159576 | consumed samples: 70960 | elapsed time per iteration (ms): 14558.1 | learning rate: 1.966E-05 | global batch size: 32 | lm loss: 6.404289E+00 | loss scale: 32768.0 | grad norm: 133735.905 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3695/ 159576 | consumed samples: 70992 | elapsed time per iteration (ms): 14502.5 | learning rate: 1.967E-05 | global batch size: 32 | lm loss: 6.501413E+00 | loss scale: 32768.0 | grad norm: 126881.621 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3696/ 159576 | consumed samples: 71024 | elapsed time per iteration (ms): 14876.1 | learning rate: 1.968E-05 | global batch size: 32 | lm loss: 6.348512E+00 | loss scale: 32768.0 | grad norm: 117844.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3697/ 159576 | consumed samples: 71056 | elapsed time per iteration (ms): 14704.7 | learning rate: 1.969E-05 | global batch size: 32 | lm loss: 6.490881E+00 | loss scale: 32768.0 | grad norm: 191050.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3698/ 159576 | consumed samples: 71088 | elapsed time per iteration (ms): 14521.5 | learning rate: 1.970E-05 | global batch size: 32 | lm loss: 6.399506E+00 | loss scale: 32768.0 | grad norm: 131579.663 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3699/ 159576 | consumed samples: 71120 | elapsed time per iteration (ms): 14570.1 | learning rate: 1.971E-05 | global batch size: 32 | lm loss: 6.507861E+00 | loss scale: 32768.0 | grad norm: 124970.942 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3700/ 159576 | consumed samples: 71152 | elapsed time per iteration (ms): 15037.4 | learning rate: 1.972E-05 | global batch size: 32 | lm loss: 6.460707E+00 | loss scale: 32768.0 | grad norm: 163864.847 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3701/ 159576 | consumed samples: 71184 | elapsed time per iteration (ms): 14616.1 | learning rate: 1.973E-05 | global batch size: 32 | lm loss: 6.410345E+00 | loss scale: 32768.0 | grad norm: 155995.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3702/ 159576 | consumed samples: 71216 | elapsed time per iteration (ms): 14555.1 | learning rate: 1.974E-05 | global batch size: 32 | lm loss: 6.418409E+00 | loss scale: 32768.0 | grad norm: 135398.679 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3703/ 159576 | consumed samples: 71248 | elapsed time per iteration (ms): 14529.9 | learning rate: 1.974E-05 | global batch size: 32 | lm loss: 6.445669E+00 | loss scale: 32768.0 | grad norm: 149575.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3704/ 159576 | consumed samples: 71280 | elapsed time per iteration (ms): 14938.6 | learning rate: 1.975E-05 | global batch size: 32 | lm loss: 6.466682E+00 | loss scale: 32768.0 | grad norm: 158480.859 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3705/ 159576 | consumed samples: 71312 | elapsed time per iteration (ms): 14501.2 | learning rate: 1.976E-05 | global batch size: 32 | lm loss: 6.391745E+00 | loss scale: 32768.0 | grad norm: 130405.062 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3706/ 159576 | consumed samples: 71344 | elapsed time per iteration (ms): 14560.8 | learning rate: 1.977E-05 | global batch size: 32 | lm loss: 6.367959E+00 | loss scale: 32768.0 | grad norm: 134894.924 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3707/ 159576 | consumed samples: 71376 | elapsed time per iteration (ms): 14606.1 | learning rate: 1.978E-05 | global batch size: 32 | lm loss: 6.568520E+00 | loss scale: 32768.0 | grad norm: 127252.532 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3708/ 159576 | consumed samples: 71408 | elapsed time per iteration (ms): 14831.0 | learning rate: 1.979E-05 | global batch size: 32 | lm loss: 6.451063E+00 | loss scale: 32768.0 | grad norm: 352497.893 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3709/ 159576 | consumed samples: 71440 | elapsed time per iteration (ms): 14547.0 | learning rate: 1.980E-05 | global batch size: 32 | lm loss: 6.534979E+00 | loss scale: 32768.0 | grad norm: 139565.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3710/ 159576 | consumed samples: 71472 | elapsed time per iteration (ms): 14583.9 | learning rate: 1.981E-05 | global batch size: 32 | lm loss: 6.561714E+00 | loss scale: 32768.0 | grad norm: 190647.531 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3711/ 159576 | consumed samples: 71504 | elapsed time per iteration (ms): 14605.2 | learning rate: 1.982E-05 | global batch size: 32 | lm loss: 6.594619E+00 | loss scale: 32768.0 | grad norm: 159179.628 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3712/ 159576 | consumed samples: 71536 | elapsed time per iteration (ms): 14853.8 | learning rate: 1.982E-05 | global batch size: 32 | lm loss: 6.221584E+00 | loss scale: 32768.0 | grad norm: 163662.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3713/ 159576 | consumed samples: 71568 | elapsed time per iteration (ms): 14625.6 | learning rate: 1.983E-05 | global batch size: 32 | lm loss: 6.384083E+00 | loss scale: 32768.0 | grad norm: 157426.857 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3714/ 159576 | consumed samples: 71600 | elapsed time per iteration (ms): 14617.1 | learning rate: 1.984E-05 | global batch size: 32 | lm loss: 6.457389E+00 | loss scale: 32768.0 | grad norm: 163827.138 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3715/ 159576 | consumed samples: 71632 | elapsed time per iteration (ms): 14519.7 | learning rate: 1.985E-05 | global batch size: 32 | lm loss: 6.461262E+00 | loss scale: 32768.0 | grad norm: 150641.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3716/ 159576 | consumed samples: 71664 | elapsed time per iteration (ms): 14921.5 | learning rate: 1.986E-05 | global batch size: 32 | lm loss: 6.345608E+00 | loss scale: 32768.0 | grad norm: 146728.063 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3717/ 159576 | consumed samples: 71696 | elapsed time per iteration (ms): 14643.5 | learning rate: 1.987E-05 | global batch size: 32 | lm loss: 6.488680E+00 | loss scale: 32768.0 | grad norm: 159547.980 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3718/ 159576 | consumed samples: 71728 | elapsed time per iteration (ms): 14531.6 | learning rate: 1.988E-05 | global batch size: 32 | lm loss: 6.358843E+00 | loss scale: 32768.0 | grad norm: 120331.967 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3719/ 159576 | consumed samples: 71760 | elapsed time per iteration (ms): 14544.0 | learning rate: 1.989E-05 | global batch size: 32 | lm loss: 6.480108E+00 | loss scale: 32768.0 | grad norm: 136903.050 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3720/ 159576 | consumed samples: 71792 | elapsed time per iteration (ms): 14789.8 | learning rate: 1.989E-05 | global batch size: 32 | lm loss: 6.423407E+00 | loss scale: 32768.0 | grad norm: 144666.737 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3721/ 159576 | consumed samples: 71824 | elapsed time per iteration (ms): 14759.3 | learning rate: 1.990E-05 | global batch size: 32 | lm loss: 6.280478E+00 | loss scale: 32768.0 | grad norm: 131505.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3722/ 159576 | consumed samples: 71856 | elapsed time per iteration (ms): 14493.1 | learning rate: 1.991E-05 | global batch size: 32 | lm loss: 6.341520E+00 | loss scale: 32768.0 | grad norm: 153861.927 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3723/ 159576 | consumed samples: 71888 | elapsed time per iteration (ms): 14523.6 | learning rate: 1.992E-05 | global batch size: 32 | lm loss: 6.470270E+00 | loss scale: 32768.0 | grad norm: 129755.757 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3724/ 159576 | consumed samples: 71920 | elapsed time per iteration (ms): 14486.1 | learning rate: 1.993E-05 | global batch size: 32 | lm loss: 6.425168E+00 | loss scale: 32768.0 | grad norm: 117324.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3725/ 159576 | consumed samples: 71952 | elapsed time per iteration (ms): 14760.5 | learning rate: 1.994E-05 | global batch size: 32 | lm loss: 6.508280E+00 | loss scale: 32768.0 | grad norm: 128492.118 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3726/ 159576 | consumed samples: 71984 | elapsed time per iteration (ms): 14523.7 | learning rate: 1.995E-05 | global batch size: 32 | lm loss: 6.451111E+00 | loss scale: 32768.0 | grad norm: 167230.725 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3727/ 159576 | consumed samples: 72016 | elapsed time per iteration (ms): 14569.3 | learning rate: 1.996E-05 | global batch size: 32 | lm loss: 6.428119E+00 | loss scale: 32768.0 | grad norm: 118648.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3728/ 159576 | consumed samples: 72048 | elapsed time per iteration (ms): 14495.2 | learning rate: 1.997E-05 | global batch size: 32 | lm loss: 6.472005E+00 | loss scale: 32768.0 | grad norm: 129074.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3729/ 159576 | consumed samples: 72080 | elapsed time per iteration (ms): 14750.9 | learning rate: 1.997E-05 | global batch size: 32 | lm loss: 6.501527E+00 | loss scale: 32768.0 | grad norm: 149114.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3730/ 159576 | consumed samples: 72112 | elapsed time per iteration (ms): 14542.0 | learning rate: 1.998E-05 | global batch size: 32 | lm loss: 6.441484E+00 | loss scale: 32768.0 | grad norm: 115103.080 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3731/ 159576 | consumed samples: 72144 | elapsed time per iteration (ms): 14563.9 | learning rate: 1.999E-05 | global batch size: 32 | lm loss: 6.365570E+00 | loss scale: 32768.0 | grad norm: 122866.924 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3732/ 159576 | consumed samples: 72176 | elapsed time per iteration (ms): 14514.0 | learning rate: 2.000E-05 | global batch size: 32 | lm loss: 6.432354E+00 | loss scale: 32768.0 | grad norm: 117503.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3733/ 159576 | consumed samples: 72208 | elapsed time per iteration (ms): 14782.6 | learning rate: 2.001E-05 | global batch size: 32 | lm loss: 6.406446E+00 | loss scale: 32768.0 | grad norm: 118771.624 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3734/ 159576 | consumed samples: 72240 | elapsed time per iteration (ms): 14599.5 | learning rate: 2.002E-05 | global batch size: 32 | lm loss: 6.564467E+00 | loss scale: 32768.0 | grad norm: 113605.510 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3735/ 159576 | consumed samples: 72272 | elapsed time per iteration (ms): 14490.9 | learning rate: 2.003E-05 | global batch size: 32 | lm loss: 6.709463E+00 | loss scale: 32768.0 | grad norm: 143048.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3736/ 159576 | consumed samples: 72304 | elapsed time per iteration (ms): 14616.2 | learning rate: 2.004E-05 | global batch size: 32 | lm loss: 6.388952E+00 | loss scale: 32768.0 | grad norm: 148752.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3737/ 159576 | consumed samples: 72336 | elapsed time per iteration (ms): 14690.4 | learning rate: 2.005E-05 | global batch size: 32 | lm loss: 6.671305E+00 | loss scale: 32768.0 | grad norm: 167080.674 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3738/ 159576 | consumed samples: 72368 | elapsed time per iteration (ms): 14577.2 | learning rate: 2.005E-05 | global batch size: 32 | lm loss: 6.441625E+00 | loss scale: 32768.0 | grad norm: 132744.798 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3739/ 159576 | consumed samples: 72400 | elapsed time per iteration (ms): 14526.3 | learning rate: 2.006E-05 | global batch size: 32 | lm loss: 6.382997E+00 | loss scale: 32768.0 | grad norm: 137597.004 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3740/ 159576 | consumed samples: 72432 | elapsed time per iteration (ms): 14497.0 | learning rate: 2.007E-05 | global batch size: 32 | lm loss: 6.423009E+00 | loss scale: 32768.0 | grad norm: 158026.136 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3741/ 159576 | consumed samples: 72464 | elapsed time per iteration (ms): 14972.2 | learning rate: 2.008E-05 | global batch size: 32 | lm loss: 6.350714E+00 | loss scale: 32768.0 | grad norm: 133556.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3742/ 159576 | consumed samples: 72496 | elapsed time per iteration (ms): 14524.0 | learning rate: 2.009E-05 | global batch size: 32 | lm loss: 6.481720E+00 | loss scale: 32768.0 | grad norm: 111295.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3743/ 159576 | consumed samples: 72528 | elapsed time per iteration (ms): 14585.5 | learning rate: 2.010E-05 | global batch size: 32 | lm loss: 6.427812E+00 | loss scale: 32768.0 | grad norm: 147125.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3744/ 159576 | consumed samples: 72560 | elapsed time per iteration (ms): 14494.4 | learning rate: 2.011E-05 | global batch size: 32 | lm loss: 6.548944E+00 | loss scale: 32768.0 | grad norm: 157070.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3745/ 159576 | consumed samples: 72592 | elapsed time per iteration (ms): 14860.3 | learning rate: 2.012E-05 | global batch size: 32 | lm loss: 6.524699E+00 | loss scale: 32768.0 | grad norm: 133650.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3746/ 159576 | consumed samples: 72624 | elapsed time per iteration (ms): 14524.8 | learning rate: 2.013E-05 | global batch size: 32 | lm loss: 6.462801E+00 | loss scale: 32768.0 | grad norm: 145785.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3747/ 159576 | consumed samples: 72656 | elapsed time per iteration (ms): 14508.2 | learning rate: 2.013E-05 | global batch size: 32 | lm loss: 6.505124E+00 | loss scale: 32768.0 | grad norm: 159039.833 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3748/ 159576 | consumed samples: 72688 | elapsed time per iteration (ms): 14534.8 | learning rate: 2.014E-05 | global batch size: 32 | lm loss: 6.554813E+00 | loss scale: 32768.0 | grad norm: 144107.066 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3749/ 159576 | consumed samples: 72720 | elapsed time per iteration (ms): 14885.2 | learning rate: 2.015E-05 | global batch size: 32 | lm loss: 6.509037E+00 | loss scale: 32768.0 | grad norm: 139312.960 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3750/ 159576 | consumed samples: 72752 | elapsed time per iteration (ms): 14531.0 | learning rate: 2.016E-05 | global batch size: 32 | lm loss: 6.393044E+00 | loss scale: 32768.0 | grad norm: 177829.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3751/ 159576 | consumed samples: 72784 | elapsed time per iteration (ms): 14500.7 | learning rate: 2.017E-05 | global batch size: 32 | lm loss: 6.362189E+00 | loss scale: 32768.0 | grad norm: 176679.914 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3752/ 159576 | consumed samples: 72816 | elapsed time per iteration (ms): 14533.8 | learning rate: 2.018E-05 | global batch size: 32 | lm loss: 6.594802E+00 | loss scale: 32768.0 | grad norm: 172136.738 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3753/ 159576 | consumed samples: 72848 | elapsed time per iteration (ms): 7743.9 | learning rate: 2.018E-05 | global batch size: 32 | lm loss: 6.535247E+00 | loss scale: 32768.0 | grad norm: 172136.738 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3754/ 159576 | consumed samples: 72880 | elapsed time per iteration (ms): 14383.1 | learning rate: 2.019E-05 | global batch size: 32 | lm loss: 6.354399E+00 | loss scale: 32768.0 | grad norm: 126648.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3755/ 159576 | consumed samples: 72912 | elapsed time per iteration (ms): 14590.3 | learning rate: 2.020E-05 | global batch size: 32 | lm loss: 6.473662E+00 | loss scale: 32768.0 | grad norm: 156295.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3756/ 159576 | consumed samples: 72944 | elapsed time per iteration (ms): 7767.7 | learning rate: 2.020E-05 | global batch size: 32 | lm loss: 6.609807E+00 | loss scale: 16384.0 | grad norm: 156295.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3757/ 159576 | consumed samples: 72976 | elapsed time per iteration (ms): 14046.4 | learning rate: 2.021E-05 | global batch size: 32 | lm loss: 6.389218E+00 | loss scale: 16384.0 | grad norm: 71738.658 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3758/ 159576 | consumed samples: 73008 | elapsed time per iteration (ms): 14805.7 | learning rate: 2.021E-05 | global batch size: 32 | lm loss: 6.361919E+00 | loss scale: 16384.0 | grad norm: 60700.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3759/ 159576 | consumed samples: 73040 | elapsed time per iteration (ms): 14722.8 | learning rate: 2.022E-05 | global batch size: 32 | lm loss: 6.447733E+00 | loss scale: 16384.0 | grad norm: 87663.180 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3760/ 159576 | consumed samples: 73072 | elapsed time per iteration (ms): 14583.0 | learning rate: 2.023E-05 | global batch size: 32 | lm loss: 6.446470E+00 | loss scale: 16384.0 | grad norm: 67781.743 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3761/ 159576 | consumed samples: 73104 | elapsed time per iteration (ms): 14493.9 | learning rate: 2.024E-05 | global batch size: 32 | lm loss: 6.378415E+00 | loss scale: 16384.0 | grad norm: 72177.747 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3762/ 159576 | consumed samples: 73136 | elapsed time per iteration (ms): 14567.8 | learning rate: 2.025E-05 | global batch size: 32 | lm loss: 6.576702E+00 | loss scale: 16384.0 | grad norm: 87501.793 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3763/ 159576 | consumed samples: 73168 | elapsed time per iteration (ms): 14732.6 | learning rate: 2.026E-05 | global batch size: 32 | lm loss: 6.522850E+00 | loss scale: 16384.0 | grad norm: 66784.841 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3764/ 159576 | consumed samples: 73200 | elapsed time per iteration (ms): 14572.5 | learning rate: 2.027E-05 | global batch size: 32 | lm loss: 6.361198E+00 | loss scale: 16384.0 | grad norm: 85761.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3765/ 159576 | consumed samples: 73232 | elapsed time per iteration (ms): 14647.5 | learning rate: 2.028E-05 | global batch size: 32 | lm loss: 6.605127E+00 | loss scale: 16384.0 | grad norm: 69863.144 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3766/ 159576 | consumed samples: 73264 | elapsed time per iteration (ms): 14606.0 | learning rate: 2.029E-05 | global batch size: 32 | lm loss: 6.398610E+00 | loss scale: 16384.0 | grad norm: 94809.931 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3767/ 159576 | consumed samples: 73296 | elapsed time per iteration (ms): 14708.7 | learning rate: 2.029E-05 | global batch size: 32 | lm loss: 6.484084E+00 | loss scale: 16384.0 | grad norm: 74741.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3768/ 159576 | consumed samples: 73328 | elapsed time per iteration (ms): 14555.4 | learning rate: 2.030E-05 | global batch size: 32 | lm loss: 6.496735E+00 | loss scale: 16384.0 | grad norm: 77000.443 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3769/ 159576 | consumed samples: 73360 | elapsed time per iteration (ms): 14556.9 | learning rate: 2.031E-05 | global batch size: 32 | lm loss: 6.386226E+00 | loss scale: 16384.0 | grad norm: 92155.881 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3770/ 159576 | consumed samples: 73392 | elapsed time per iteration (ms): 14623.6 | learning rate: 2.032E-05 | global batch size: 32 | lm loss: 6.446381E+00 | loss scale: 16384.0 | grad norm: 91554.158 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3771/ 159576 | consumed samples: 73424 | elapsed time per iteration (ms): 14736.8 | learning rate: 2.033E-05 | global batch size: 32 | lm loss: 6.477424E+00 | loss scale: 16384.0 | grad norm: 79287.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3772/ 159576 | consumed samples: 73456 | elapsed time per iteration (ms): 14586.8 | learning rate: 2.034E-05 | global batch size: 32 | lm loss: 6.505037E+00 | loss scale: 16384.0 | grad norm: 76395.186 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3773/ 159576 | consumed samples: 73488 | elapsed time per iteration (ms): 14638.2 | learning rate: 2.035E-05 | global batch size: 32 | lm loss: 6.536213E+00 | loss scale: 16384.0 | grad norm: 64411.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3774/ 159576 | consumed samples: 73520 | elapsed time per iteration (ms): 14533.1 | learning rate: 2.036E-05 | global batch size: 32 | lm loss: 6.477271E+00 | loss scale: 16384.0 | grad norm: 79531.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3775/ 159576 | consumed samples: 73552 | elapsed time per iteration (ms): 14956.5 | learning rate: 2.037E-05 | global batch size: 32 | lm loss: 6.364020E+00 | loss scale: 16384.0 | grad norm: 72312.067 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3776/ 159576 | consumed samples: 73584 | elapsed time per iteration (ms): 14572.0 | learning rate: 2.037E-05 | global batch size: 32 | lm loss: 6.331044E+00 | loss scale: 16384.0 | grad norm: 84164.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3777/ 159576 | consumed samples: 73616 | elapsed time per iteration (ms): 14594.9 | learning rate: 2.038E-05 | global batch size: 32 | lm loss: 6.512950E+00 | loss scale: 16384.0 | grad norm: 77822.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3778/ 159576 | consumed samples: 73648 | elapsed time per iteration (ms): 14607.5 | learning rate: 2.039E-05 | global batch size: 32 | lm loss: 6.549839E+00 | loss scale: 16384.0 | grad norm: 66443.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3779/ 159576 | consumed samples: 73680 | elapsed time per iteration (ms): 14999.4 | learning rate: 2.040E-05 | global batch size: 32 | lm loss: 6.475536E+00 | loss scale: 16384.0 | grad norm: 88572.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3780/ 159576 | consumed samples: 73712 | elapsed time per iteration (ms): 14681.3 | learning rate: 2.041E-05 | global batch size: 32 | lm loss: 6.548042E+00 | loss scale: 16384.0 | grad norm: 74648.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3781/ 159576 | consumed samples: 73744 | elapsed time per iteration (ms): 14610.5 | learning rate: 2.042E-05 | global batch size: 32 | lm loss: 6.445394E+00 | loss scale: 16384.0 | grad norm: 79663.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3782/ 159576 | consumed samples: 73776 | elapsed time per iteration (ms): 14624.0 | learning rate: 2.043E-05 | global batch size: 32 | lm loss: 6.496744E+00 | loss scale: 16384.0 | grad norm: 77740.129 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3783/ 159576 | consumed samples: 73808 | elapsed time per iteration (ms): 15155.7 | learning rate: 2.044E-05 | global batch size: 32 | lm loss: 6.402834E+00 | loss scale: 16384.0 | grad norm: 74857.589 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3784/ 159576 | consumed samples: 73840 | elapsed time per iteration (ms): 14584.9 | learning rate: 2.045E-05 | global batch size: 32 | lm loss: 6.375038E+00 | loss scale: 16384.0 | grad norm: 86117.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3785/ 159576 | consumed samples: 73872 | elapsed time per iteration (ms): 14634.8 | learning rate: 2.045E-05 | global batch size: 32 | lm loss: 6.507965E+00 | loss scale: 16384.0 | grad norm: 78691.029 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3786/ 159576 | consumed samples: 73904 | elapsed time per iteration (ms): 14635.7 | learning rate: 2.046E-05 | global batch size: 32 | lm loss: 6.375463E+00 | loss scale: 16384.0 | grad norm: 105222.970 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3787/ 159576 | consumed samples: 73936 | elapsed time per iteration (ms): 14981.3 | learning rate: 2.047E-05 | global batch size: 32 | lm loss: 6.494486E+00 | loss scale: 16384.0 | grad norm: 70745.031 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3788/ 159576 | consumed samples: 73968 | elapsed time per iteration (ms): 14576.6 | learning rate: 2.048E-05 | global batch size: 32 | lm loss: 6.350873E+00 | loss scale: 16384.0 | grad norm: 81350.508 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3789/ 159576 | consumed samples: 74000 | elapsed time per iteration (ms): 14674.5 | learning rate: 2.049E-05 | global batch size: 32 | lm loss: 6.467069E+00 | loss scale: 16384.0 | grad norm: 84086.046 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3790/ 159576 | consumed samples: 74032 | elapsed time per iteration (ms): 14585.2 | learning rate: 2.050E-05 | global batch size: 32 | lm loss: 6.420381E+00 | loss scale: 16384.0 | grad norm: 79517.176 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3791/ 159576 | consumed samples: 74064 | elapsed time per iteration (ms): 14845.4 | learning rate: 2.051E-05 | global batch size: 32 | lm loss: 6.528859E+00 | loss scale: 16384.0 | grad norm: 87747.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3792/ 159576 | consumed samples: 74096 | elapsed time per iteration (ms): 14671.9 | learning rate: 2.052E-05 | global batch size: 32 | lm loss: 6.445452E+00 | loss scale: 16384.0 | grad norm: 76185.902 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3793/ 159576 | consumed samples: 74128 | elapsed time per iteration (ms): 14614.2 | learning rate: 2.053E-05 | global batch size: 32 | lm loss: 6.579043E+00 | loss scale: 16384.0 | grad norm: 85891.467 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3794/ 159576 | consumed samples: 74160 | elapsed time per iteration (ms): 14636.7 | learning rate: 2.053E-05 | global batch size: 32 | lm loss: 6.481782E+00 | loss scale: 16384.0 | grad norm: 62633.733 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3795/ 159576 | consumed samples: 74192 | elapsed time per iteration (ms): 14963.5 | learning rate: 2.054E-05 | global batch size: 32 | lm loss: 6.517486E+00 | loss scale: 16384.0 | grad norm: 67403.184 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3796/ 159576 | consumed samples: 74224 | elapsed time per iteration (ms): 14620.1 | learning rate: 2.055E-05 | global batch size: 32 | lm loss: 6.417095E+00 | loss scale: 16384.0 | grad norm: 62157.167 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3797/ 159576 | consumed samples: 74256 | elapsed time per iteration (ms): 14620.8 | learning rate: 2.056E-05 | global batch size: 32 | lm loss: 6.419306E+00 | loss scale: 16384.0 | grad norm: 73456.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3798/ 159576 | consumed samples: 74288 | elapsed time per iteration (ms): 14577.9 | learning rate: 2.057E-05 | global batch size: 32 | lm loss: 6.487021E+00 | loss scale: 16384.0 | grad norm: 67613.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3799/ 159576 | consumed samples: 74320 | elapsed time per iteration (ms): 14963.8 | learning rate: 2.058E-05 | global batch size: 32 | lm loss: 6.459682E+00 | loss scale: 16384.0 | grad norm: 73515.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3800/ 159576 | consumed samples: 74352 | elapsed time per iteration (ms): 14567.9 | learning rate: 2.059E-05 | global batch size: 32 | lm loss: 6.321566E+00 | loss scale: 16384.0 | grad norm: 77546.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3801/ 159576 | consumed samples: 74384 | elapsed time per iteration (ms): 14600.7 | learning rate: 2.060E-05 | global batch size: 32 | lm loss: 6.582398E+00 | loss scale: 16384.0 | grad norm: 78424.143 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3802/ 159576 | consumed samples: 74416 | elapsed time per iteration (ms): 14644.4 | learning rate: 2.061E-05 | global batch size: 32 | lm loss: 6.394701E+00 | loss scale: 16384.0 | grad norm: 82174.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3803/ 159576 | consumed samples: 74448 | elapsed time per iteration (ms): 14905.7 | learning rate: 2.061E-05 | global batch size: 32 | lm loss: 6.388845E+00 | loss scale: 16384.0 | grad norm: 67050.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3804/ 159576 | consumed samples: 74480 | elapsed time per iteration (ms): 14636.0 | learning rate: 2.062E-05 | global batch size: 32 | lm loss: 6.513092E+00 | loss scale: 16384.0 | grad norm: 118423.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3805/ 159576 | consumed samples: 74512 | elapsed time per iteration (ms): 14511.9 | learning rate: 2.063E-05 | global batch size: 32 | lm loss: 6.418696E+00 | loss scale: 16384.0 | grad norm: 71096.098 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3806/ 159576 | consumed samples: 74544 | elapsed time per iteration (ms): 14523.9 | learning rate: 2.064E-05 | global batch size: 32 | lm loss: 6.286570E+00 | loss scale: 16384.0 | grad norm: 93004.901 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3807/ 159576 | consumed samples: 74576 | elapsed time per iteration (ms): 14509.8 | learning rate: 2.065E-05 | global batch size: 32 | lm loss: 6.565314E+00 | loss scale: 16384.0 | grad norm: 76207.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3808/ 159576 | consumed samples: 74608 | elapsed time per iteration (ms): 15001.7 | learning rate: 2.066E-05 | global batch size: 32 | lm loss: 6.597963E+00 | loss scale: 16384.0 | grad norm: 136405.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3809/ 159576 | consumed samples: 74640 | elapsed time per iteration (ms): 14540.5 | learning rate: 2.067E-05 | global batch size: 32 | lm loss: 6.619783E+00 | loss scale: 16384.0 | grad norm: 75270.102 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3810/ 159576 | consumed samples: 74672 | elapsed time per iteration (ms): 14582.3 | learning rate: 2.068E-05 | global batch size: 32 | lm loss: 6.406981E+00 | loss scale: 16384.0 | grad norm: 81052.948 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3811/ 159576 | consumed samples: 74704 | elapsed time per iteration (ms): 14512.1 | learning rate: 2.068E-05 | global batch size: 32 | lm loss: 6.487488E+00 | loss scale: 16384.0 | grad norm: 87400.939 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3812/ 159576 | consumed samples: 74736 | elapsed time per iteration (ms): 14767.4 | learning rate: 2.069E-05 | global batch size: 32 | lm loss: 6.416305E+00 | loss scale: 16384.0 | grad norm: 104809.852 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3813/ 159576 | consumed samples: 74768 | elapsed time per iteration (ms): 14457.6 | learning rate: 2.070E-05 | global batch size: 32 | lm loss: 6.405777E+00 | loss scale: 16384.0 | grad norm: 79282.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3814/ 159576 | consumed samples: 74800 | elapsed time per iteration (ms): 14520.7 | learning rate: 2.071E-05 | global batch size: 32 | lm loss: 6.435395E+00 | loss scale: 16384.0 | grad norm: 75788.929 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3815/ 159576 | consumed samples: 74832 | elapsed time per iteration (ms): 14520.3 | learning rate: 2.072E-05 | global batch size: 32 | lm loss: 6.324138E+00 | loss scale: 16384.0 | grad norm: 77448.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3816/ 159576 | consumed samples: 74864 | elapsed time per iteration (ms): 14756.0 | learning rate: 2.073E-05 | global batch size: 32 | lm loss: 6.479269E+00 | loss scale: 16384.0 | grad norm: 80928.548 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3817/ 159576 | consumed samples: 74896 | elapsed time per iteration (ms): 14631.8 | learning rate: 2.074E-05 | global batch size: 32 | lm loss: 6.448977E+00 | loss scale: 16384.0 | grad norm: 81667.758 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3818/ 159576 | consumed samples: 74928 | elapsed time per iteration (ms): 14631.1 | learning rate: 2.075E-05 | global batch size: 32 | lm loss: 6.550106E+00 | loss scale: 16384.0 | grad norm: 65592.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3819/ 159576 | consumed samples: 74960 | elapsed time per iteration (ms): 14596.0 | learning rate: 2.076E-05 | global batch size: 32 | lm loss: 6.589513E+00 | loss scale: 16384.0 | grad norm: 93606.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3820/ 159576 | consumed samples: 74992 | elapsed time per iteration (ms): 14800.0 | learning rate: 2.076E-05 | global batch size: 32 | lm loss: 6.472552E+00 | loss scale: 16384.0 | grad norm: 63974.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3821/ 159576 | consumed samples: 75024 | elapsed time per iteration (ms): 14588.9 | learning rate: 2.077E-05 | global batch size: 32 | lm loss: 6.366886E+00 | loss scale: 16384.0 | grad norm: 87736.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3822/ 159576 | consumed samples: 75056 | elapsed time per iteration (ms): 14606.9 | learning rate: 2.078E-05 | global batch size: 32 | lm loss: 6.523769E+00 | loss scale: 16384.0 | grad norm: 81803.750 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3823/ 159576 | consumed samples: 75088 | elapsed time per iteration (ms): 14588.5 | learning rate: 2.079E-05 | global batch size: 32 | lm loss: 6.495326E+00 | loss scale: 16384.0 | grad norm: 63058.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3824/ 159576 | consumed samples: 75120 | elapsed time per iteration (ms): 14986.3 | learning rate: 2.080E-05 | global batch size: 32 | lm loss: 6.557096E+00 | loss scale: 16384.0 | grad norm: 80592.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3825/ 159576 | consumed samples: 75152 | elapsed time per iteration (ms): 14684.5 | learning rate: 2.081E-05 | global batch size: 32 | lm loss: 6.436917E+00 | loss scale: 16384.0 | grad norm: 70196.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3826/ 159576 | consumed samples: 75184 | elapsed time per iteration (ms): 14627.4 | learning rate: 2.082E-05 | global batch size: 32 | lm loss: 6.413104E+00 | loss scale: 16384.0 | grad norm: 81085.772 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 17:07:03] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 17:07:03] PULSE: tr8-104B is running for 11:14:52 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 3827/ 159576 | consumed samples: 75216 | elapsed time per iteration (ms): 14531.9 | learning rate: 2.083E-05 | global batch size: 32 | lm loss: 6.427704E+00 | loss scale: 16384.0 | grad norm: 68943.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3828/ 159576 | consumed samples: 75248 | elapsed time per iteration (ms): 14988.1 | learning rate: 2.084E-05 | global batch size: 32 | lm loss: 6.347779E+00 | loss scale: 16384.0 | grad norm: 64095.719 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3829/ 159576 | consumed samples: 75280 | elapsed time per iteration (ms): 14665.9 | learning rate: 2.084E-05 | global batch size: 32 | lm loss: 6.411919E+00 | loss scale: 16384.0 | grad norm: 82008.163 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3830/ 159576 | consumed samples: 75312 | elapsed time per iteration (ms): 14539.9 | learning rate: 2.085E-05 | global batch size: 32 | lm loss: 6.458866E+00 | loss scale: 16384.0 | grad norm: 67971.949 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3831/ 159576 | consumed samples: 75344 | elapsed time per iteration (ms): 14600.2 | learning rate: 2.086E-05 | global batch size: 32 | lm loss: 6.450158E+00 | loss scale: 16384.0 | grad norm: 59376.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3832/ 159576 | consumed samples: 75376 | elapsed time per iteration (ms): 14931.8 | learning rate: 2.087E-05 | global batch size: 32 | lm loss: 6.537256E+00 | loss scale: 16384.0 | grad norm: 77538.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3833/ 159576 | consumed samples: 75408 | elapsed time per iteration (ms): 14592.6 | learning rate: 2.088E-05 | global batch size: 32 | lm loss: 6.392985E+00 | loss scale: 16384.0 | grad norm: 84275.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3834/ 159576 | consumed samples: 75440 | elapsed time per iteration (ms): 14616.6 | learning rate: 2.089E-05 | global batch size: 32 | lm loss: 6.512251E+00 | loss scale: 16384.0 | grad norm: 80167.095 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3835/ 159576 | consumed samples: 75472 | elapsed time per iteration (ms): 14584.0 | learning rate: 2.090E-05 | global batch size: 32 | lm loss: 6.467295E+00 | loss scale: 16384.0 | grad norm: 85124.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3836/ 159576 | consumed samples: 75504 | elapsed time per iteration (ms): 14844.3 | learning rate: 2.091E-05 | global batch size: 32 | lm loss: 6.514040E+00 | loss scale: 16384.0 | grad norm: 71539.963 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3837/ 159576 | consumed samples: 75536 | elapsed time per iteration (ms): 14618.8 | learning rate: 2.092E-05 | global batch size: 32 | lm loss: 6.519591E+00 | loss scale: 16384.0 | grad norm: 89173.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3838/ 159576 | consumed samples: 75568 | elapsed time per iteration (ms): 14566.0 | learning rate: 2.092E-05 | global batch size: 32 | lm loss: 6.447284E+00 | loss scale: 16384.0 | grad norm: 86030.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3839/ 159576 | consumed samples: 75600 | elapsed time per iteration (ms): 14636.3 | learning rate: 2.093E-05 | global batch size: 32 | lm loss: 6.369718E+00 | loss scale: 16384.0 | grad norm: 66275.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3840/ 159576 | consumed samples: 75632 | elapsed time per iteration (ms): 14897.9 | learning rate: 2.094E-05 | global batch size: 32 | lm loss: 6.467171E+00 | loss scale: 16384.0 | grad norm: 82043.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3841/ 159576 | consumed samples: 75664 | elapsed time per iteration (ms): 14554.8 | learning rate: 2.095E-05 | global batch size: 32 | lm loss: 6.458669E+00 | loss scale: 16384.0 | grad norm: 73761.762 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3842/ 159576 | consumed samples: 75696 | elapsed time per iteration (ms): 14564.2 | learning rate: 2.096E-05 | global batch size: 32 | lm loss: 6.516797E+00 | loss scale: 16384.0 | grad norm: 83647.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3843/ 159576 | consumed samples: 75728 | elapsed time per iteration (ms): 14464.9 | learning rate: 2.097E-05 | global batch size: 32 | lm loss: 6.381551E+00 | loss scale: 16384.0 | grad norm: 58297.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3844/ 159576 | consumed samples: 75760 | elapsed time per iteration (ms): 14942.4 | learning rate: 2.098E-05 | global batch size: 32 | lm loss: 6.471825E+00 | loss scale: 16384.0 | grad norm: 82881.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3845/ 159576 | consumed samples: 75792 | elapsed time per iteration (ms): 14531.3 | learning rate: 2.099E-05 | global batch size: 32 | lm loss: 6.528457E+00 | loss scale: 16384.0 | grad norm: 67296.172 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3846/ 159576 | consumed samples: 75824 | elapsed time per iteration (ms): 14601.9 | learning rate: 2.100E-05 | global batch size: 32 | lm loss: 6.408827E+00 | loss scale: 16384.0 | grad norm: 67512.624 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3847/ 159576 | consumed samples: 75856 | elapsed time per iteration (ms): 14580.2 | learning rate: 2.100E-05 | global batch size: 32 | lm loss: 6.440091E+00 | loss scale: 16384.0 | grad norm: 78400.656 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3848/ 159576 | consumed samples: 75888 | elapsed time per iteration (ms): 14911.9 | learning rate: 2.101E-05 | global batch size: 32 | lm loss: 6.374573E+00 | loss scale: 16384.0 | grad norm: 85886.969 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3849/ 159576 | consumed samples: 75920 | elapsed time per iteration (ms): 14768.3 | learning rate: 2.102E-05 | global batch size: 32 | lm loss: 6.529835E+00 | loss scale: 16384.0 | grad norm: 71394.057 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3850/ 159576 | consumed samples: 75952 | elapsed time per iteration (ms): 14553.3 | learning rate: 2.103E-05 | global batch size: 32 | lm loss: 6.455585E+00 | loss scale: 16384.0 | grad norm: 67772.089 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3851/ 159576 | consumed samples: 75984 | elapsed time per iteration (ms): 14574.9 | learning rate: 2.104E-05 | global batch size: 32 | lm loss: 6.428284E+00 | loss scale: 16384.0 | grad norm: 110864.098 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3852/ 159576 | consumed samples: 76016 | elapsed time per iteration (ms): 14592.6 | learning rate: 2.105E-05 | global batch size: 32 | lm loss: 6.457644E+00 | loss scale: 16384.0 | grad norm: 73499.592 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3853/ 159576 | consumed samples: 76048 | elapsed time per iteration (ms): 14780.7 | learning rate: 2.106E-05 | global batch size: 32 | lm loss: 6.459057E+00 | loss scale: 16384.0 | grad norm: 71503.908 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3854/ 159576 | consumed samples: 76080 | elapsed time per iteration (ms): 14631.9 | learning rate: 2.107E-05 | global batch size: 32 | lm loss: 6.522111E+00 | loss scale: 16384.0 | grad norm: 73205.829 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3855/ 159576 | consumed samples: 76112 | elapsed time per iteration (ms): 14685.7 | learning rate: 2.108E-05 | global batch size: 32 | lm loss: 6.444643E+00 | loss scale: 16384.0 | grad norm: 70169.559 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3856/ 159576 | consumed samples: 76144 | elapsed time per iteration (ms): 14534.2 | learning rate: 2.108E-05 | global batch size: 32 | lm loss: 6.392300E+00 | loss scale: 16384.0 | grad norm: 81224.688 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3857/ 159576 | consumed samples: 76176 | elapsed time per iteration (ms): 14734.9 | learning rate: 2.109E-05 | global batch size: 32 | lm loss: 6.474737E+00 | loss scale: 16384.0 | grad norm: 76429.789 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3858/ 159576 | consumed samples: 76208 | elapsed time per iteration (ms): 14589.1 | learning rate: 2.110E-05 | global batch size: 32 | lm loss: 6.481500E+00 | loss scale: 16384.0 | grad norm: 76288.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3859/ 159576 | consumed samples: 76240 | elapsed time per iteration (ms): 14536.6 | learning rate: 2.111E-05 | global batch size: 32 | lm loss: 6.504058E+00 | loss scale: 16384.0 | grad norm: 75104.955 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3860/ 159576 | consumed samples: 76272 | elapsed time per iteration (ms): 14557.4 | learning rate: 2.112E-05 | global batch size: 32 | lm loss: 6.616935E+00 | loss scale: 16384.0 | grad norm: 73471.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3861/ 159576 | consumed samples: 76304 | elapsed time per iteration (ms): 14996.3 | learning rate: 2.113E-05 | global batch size: 32 | lm loss: 6.437632E+00 | loss scale: 16384.0 | grad norm: 100626.814 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3862/ 159576 | consumed samples: 76336 | elapsed time per iteration (ms): 14610.8 | learning rate: 2.114E-05 | global batch size: 32 | lm loss: 6.358921E+00 | loss scale: 16384.0 | grad norm: 84367.846 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3863/ 159576 | consumed samples: 76368 | elapsed time per iteration (ms): 14574.0 | learning rate: 2.115E-05 | global batch size: 32 | lm loss: 6.489450E+00 | loss scale: 16384.0 | grad norm: 111308.083 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3864/ 159576 | consumed samples: 76400 | elapsed time per iteration (ms): 14585.8 | learning rate: 2.116E-05 | global batch size: 32 | lm loss: 6.579299E+00 | loss scale: 16384.0 | grad norm: 71685.183 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3865/ 159576 | consumed samples: 76432 | elapsed time per iteration (ms): 14801.5 | learning rate: 2.116E-05 | global batch size: 32 | lm loss: 6.356242E+00 | loss scale: 16384.0 | grad norm: 68636.493 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3866/ 159576 | consumed samples: 76464 | elapsed time per iteration (ms): 14581.8 | learning rate: 2.117E-05 | global batch size: 32 | lm loss: 6.583051E+00 | loss scale: 16384.0 | grad norm: 83498.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3867/ 159576 | consumed samples: 76496 | elapsed time per iteration (ms): 14548.1 | learning rate: 2.118E-05 | global batch size: 32 | lm loss: 6.414474E+00 | loss scale: 16384.0 | grad norm: 70120.527 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3868/ 159576 | consumed samples: 76528 | elapsed time per iteration (ms): 14581.2 | learning rate: 2.119E-05 | global batch size: 32 | lm loss: 6.383676E+00 | loss scale: 16384.0 | grad norm: 65625.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3869/ 159576 | consumed samples: 76560 | elapsed time per iteration (ms): 14975.0 | learning rate: 2.120E-05 | global batch size: 32 | lm loss: 6.553302E+00 | loss scale: 16384.0 | grad norm: 78443.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3870/ 159576 | consumed samples: 76592 | elapsed time per iteration (ms): 14654.1 | learning rate: 2.121E-05 | global batch size: 32 | lm loss: 6.525763E+00 | loss scale: 16384.0 | grad norm: 74575.789 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3871/ 159576 | consumed samples: 76624 | elapsed time per iteration (ms): 14658.5 | learning rate: 2.122E-05 | global batch size: 32 | lm loss: 6.416959E+00 | loss scale: 16384.0 | grad norm: 61001.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3872/ 159576 | consumed samples: 76656 | elapsed time per iteration (ms): 14544.3 | learning rate: 2.123E-05 | global batch size: 32 | lm loss: 6.516649E+00 | loss scale: 16384.0 | grad norm: 76582.538 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3873/ 159576 | consumed samples: 76688 | elapsed time per iteration (ms): 14961.2 | learning rate: 2.124E-05 | global batch size: 32 | lm loss: 6.532383E+00 | loss scale: 16384.0 | grad norm: 98540.585 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3874/ 159576 | consumed samples: 76720 | elapsed time per iteration (ms): 14595.7 | learning rate: 2.124E-05 | global batch size: 32 | lm loss: 6.589262E+00 | loss scale: 16384.0 | grad norm: 90020.937 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3875/ 159576 | consumed samples: 76752 | elapsed time per iteration (ms): 14549.8 | learning rate: 2.125E-05 | global batch size: 32 | lm loss: 6.475612E+00 | loss scale: 16384.0 | grad norm: 71253.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3876/ 159576 | consumed samples: 76784 | elapsed time per iteration (ms): 14539.7 | learning rate: 2.126E-05 | global batch size: 32 | lm loss: 6.477540E+00 | loss scale: 16384.0 | grad norm: 113904.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3877/ 159576 | consumed samples: 76816 | elapsed time per iteration (ms): 14922.4 | learning rate: 2.127E-05 | global batch size: 32 | lm loss: 6.475825E+00 | loss scale: 16384.0 | grad norm: 59736.077 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3878/ 159576 | consumed samples: 76848 | elapsed time per iteration (ms): 14676.0 | learning rate: 2.128E-05 | global batch size: 32 | lm loss: 6.477038E+00 | loss scale: 16384.0 | grad norm: 73926.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3879/ 159576 | consumed samples: 76880 | elapsed time per iteration (ms): 14505.4 | learning rate: 2.129E-05 | global batch size: 32 | lm loss: 6.577363E+00 | loss scale: 16384.0 | grad norm: 65273.771 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3880/ 159576 | consumed samples: 76912 | elapsed time per iteration (ms): 14525.2 | learning rate: 2.130E-05 | global batch size: 32 | lm loss: 6.431276E+00 | loss scale: 16384.0 | grad norm: 62353.041 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3881/ 159576 | consumed samples: 76944 | elapsed time per iteration (ms): 14918.9 | learning rate: 2.131E-05 | global batch size: 32 | lm loss: 6.471975E+00 | loss scale: 16384.0 | grad norm: 80402.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3882/ 159576 | consumed samples: 76976 | elapsed time per iteration (ms): 14543.5 | learning rate: 2.132E-05 | global batch size: 32 | lm loss: 6.481179E+00 | loss scale: 16384.0 | grad norm: 59241.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3883/ 159576 | consumed samples: 77008 | elapsed time per iteration (ms): 14519.1 | learning rate: 2.132E-05 | global batch size: 32 | lm loss: 6.356431E+00 | loss scale: 16384.0 | grad norm: 66124.949 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3884/ 159576 | consumed samples: 77040 | elapsed time per iteration (ms): 14635.6 | learning rate: 2.133E-05 | global batch size: 32 | lm loss: 7.171796E+00 | loss scale: 16384.0 | grad norm: 628102.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3885/ 159576 | consumed samples: 77072 | elapsed time per iteration (ms): 14877.6 | learning rate: 2.134E-05 | global batch size: 32 | lm loss: 7.122965E+00 | loss scale: 16384.0 | grad norm: 105361.079 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3886/ 159576 | consumed samples: 77104 | elapsed time per iteration (ms): 14581.7 | learning rate: 2.135E-05 | global batch size: 32 | lm loss: 6.781033E+00 | loss scale: 16384.0 | grad norm: 90805.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3887/ 159576 | consumed samples: 77136 | elapsed time per iteration (ms): 14580.5 | learning rate: 2.136E-05 | global batch size: 32 | lm loss: 6.824611E+00 | loss scale: 16384.0 | grad norm: 128888.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3888/ 159576 | consumed samples: 77168 | elapsed time per iteration (ms): 14468.4 | learning rate: 2.137E-05 | global batch size: 32 | lm loss: 6.773994E+00 | loss scale: 16384.0 | grad norm: 67441.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3889/ 159576 | consumed samples: 77200 | elapsed time per iteration (ms): 14934.3 | learning rate: 2.138E-05 | global batch size: 32 | lm loss: 6.845183E+00 | loss scale: 16384.0 | grad norm: 171660.767 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3890/ 159576 | consumed samples: 77232 | elapsed time per iteration (ms): 14531.8 | learning rate: 2.139E-05 | global batch size: 32 | lm loss: 6.803124E+00 | loss scale: 16384.0 | grad norm: 100767.890 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3891/ 159576 | consumed samples: 77264 | elapsed time per iteration (ms): 14568.7 | learning rate: 2.139E-05 | global batch size: 32 | lm loss: 6.825951E+00 | loss scale: 16384.0 | grad norm: 84326.742 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3892/ 159576 | consumed samples: 77296 | elapsed time per iteration (ms): 14543.8 | learning rate: 2.140E-05 | global batch size: 32 | lm loss: 6.734772E+00 | loss scale: 16384.0 | grad norm: 87236.773 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3893/ 159576 | consumed samples: 77328 | elapsed time per iteration (ms): 14607.7 | learning rate: 2.141E-05 | global batch size: 32 | lm loss: 6.789660E+00 | loss scale: 16384.0 | grad norm: 88054.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3894/ 159576 | consumed samples: 77360 | elapsed time per iteration (ms): 14920.9 | learning rate: 2.142E-05 | global batch size: 32 | lm loss: 6.710454E+00 | loss scale: 16384.0 | grad norm: 182978.046 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3895/ 159576 | consumed samples: 77392 | elapsed time per iteration (ms): 14510.2 | learning rate: 2.143E-05 | global batch size: 32 | lm loss: 6.691602E+00 | loss scale: 16384.0 | grad norm: 119037.944 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3896/ 159576 | consumed samples: 77424 | elapsed time per iteration (ms): 14496.2 | learning rate: 2.144E-05 | global batch size: 32 | lm loss: 6.739342E+00 | loss scale: 16384.0 | grad norm: 97461.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3897/ 159576 | consumed samples: 77456 | elapsed time per iteration (ms): 14526.7 | learning rate: 2.145E-05 | global batch size: 32 | lm loss: 6.818674E+00 | loss scale: 16384.0 | grad norm: 86334.005 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3898/ 159576 | consumed samples: 77488 | elapsed time per iteration (ms): 14792.9 | learning rate: 2.146E-05 | global batch size: 32 | lm loss: 6.717194E+00 | loss scale: 16384.0 | grad norm: 113951.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3899/ 159576 | consumed samples: 77520 | elapsed time per iteration (ms): 14491.5 | learning rate: 2.147E-05 | global batch size: 32 | lm loss: 6.714782E+00 | loss scale: 16384.0 | grad norm: 99766.959 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3900/ 159576 | consumed samples: 77552 | elapsed time per iteration (ms): 14584.1 | learning rate: 2.147E-05 | global batch size: 32 | lm loss: 6.659179E+00 | loss scale: 16384.0 | grad norm: 89663.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3901/ 159576 | consumed samples: 77584 | elapsed time per iteration (ms): 14629.2 | learning rate: 2.148E-05 | global batch size: 32 | lm loss: 6.615579E+00 | loss scale: 16384.0 | grad norm: 68957.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3902/ 159576 | consumed samples: 77616 | elapsed time per iteration (ms): 14617.9 | learning rate: 2.149E-05 | global batch size: 32 | lm loss: 6.606854E+00 | loss scale: 16384.0 | grad norm: 99968.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3903/ 159576 | consumed samples: 77648 | elapsed time per iteration (ms): 14554.1 | learning rate: 2.150E-05 | global batch size: 32 | lm loss: 6.537298E+00 | loss scale: 16384.0 | grad norm: 67921.849 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3904/ 159576 | consumed samples: 77680 | elapsed time per iteration (ms): 14545.4 | learning rate: 2.151E-05 | global batch size: 32 | lm loss: 6.606940E+00 | loss scale: 16384.0 | grad norm: 145573.785 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3905/ 159576 | consumed samples: 77712 | elapsed time per iteration (ms): 14521.9 | learning rate: 2.152E-05 | global batch size: 32 | lm loss: 6.625298E+00 | loss scale: 16384.0 | grad norm: 96778.059 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3906/ 159576 | consumed samples: 77744 | elapsed time per iteration (ms): 14699.2 | learning rate: 2.153E-05 | global batch size: 32 | lm loss: 6.624491E+00 | loss scale: 16384.0 | grad norm: 92738.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3907/ 159576 | consumed samples: 77776 | elapsed time per iteration (ms): 14558.6 | learning rate: 2.154E-05 | global batch size: 32 | lm loss: 6.825802E+00 | loss scale: 16384.0 | grad norm: 119492.559 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3908/ 159576 | consumed samples: 77808 | elapsed time per iteration (ms): 14547.7 | learning rate: 2.155E-05 | global batch size: 32 | lm loss: 6.591653E+00 | loss scale: 16384.0 | grad norm: 78761.796 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3909/ 159576 | consumed samples: 77840 | elapsed time per iteration (ms): 14554.0 | learning rate: 2.155E-05 | global batch size: 32 | lm loss: 6.567001E+00 | loss scale: 16384.0 | grad norm: 147075.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3910/ 159576 | consumed samples: 77872 | elapsed time per iteration (ms): 15013.4 | learning rate: 2.156E-05 | global batch size: 32 | lm loss: 6.787440E+00 | loss scale: 16384.0 | grad norm: 142314.988 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3911/ 159576 | consumed samples: 77904 | elapsed time per iteration (ms): 14566.2 | learning rate: 2.157E-05 | global batch size: 32 | lm loss: 6.525432E+00 | loss scale: 16384.0 | grad norm: 87369.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3912/ 159576 | consumed samples: 77936 | elapsed time per iteration (ms): 14516.0 | learning rate: 2.158E-05 | global batch size: 32 | lm loss: 6.615817E+00 | loss scale: 16384.0 | grad norm: 83904.990 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3913/ 159576 | consumed samples: 77968 | elapsed time per iteration (ms): 14525.8 | learning rate: 2.159E-05 | global batch size: 32 | lm loss: 6.564670E+00 | loss scale: 16384.0 | grad norm: 97516.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3914/ 159576 | consumed samples: 78000 | elapsed time per iteration (ms): 15027.0 | learning rate: 2.160E-05 | global batch size: 32 | lm loss: 6.400544E+00 | loss scale: 16384.0 | grad norm: 92743.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3915/ 159576 | consumed samples: 78032 | elapsed time per iteration (ms): 14573.6 | learning rate: 2.161E-05 | global batch size: 32 | lm loss: 6.603245E+00 | loss scale: 16384.0 | grad norm: 106541.895 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3916/ 159576 | consumed samples: 78064 | elapsed time per iteration (ms): 14538.9 | learning rate: 2.162E-05 | global batch size: 32 | lm loss: 6.560642E+00 | loss scale: 16384.0 | grad norm: 71313.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3917/ 159576 | consumed samples: 78096 | elapsed time per iteration (ms): 14550.2 | learning rate: 2.163E-05 | global batch size: 32 | lm loss: 6.578140E+00 | loss scale: 16384.0 | grad norm: 83812.809 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3918/ 159576 | consumed samples: 78128 | elapsed time per iteration (ms): 14857.6 | learning rate: 2.163E-05 | global batch size: 32 | lm loss: 6.583351E+00 | loss scale: 16384.0 | grad norm: 69616.816 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3919/ 159576 | consumed samples: 78160 | elapsed time per iteration (ms): 14509.2 | learning rate: 2.164E-05 | global batch size: 32 | lm loss: 6.595952E+00 | loss scale: 16384.0 | grad norm: 83133.116 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3920/ 159576 | consumed samples: 78192 | elapsed time per iteration (ms): 14502.7 | learning rate: 2.165E-05 | global batch size: 32 | lm loss: 6.645111E+00 | loss scale: 16384.0 | grad norm: 69570.909 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3921/ 159576 | consumed samples: 78224 | elapsed time per iteration (ms): 14498.8 | learning rate: 2.166E-05 | global batch size: 32 | lm loss: 6.553501E+00 | loss scale: 16384.0 | grad norm: 142896.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3922/ 159576 | consumed samples: 78256 | elapsed time per iteration (ms): 14842.1 | learning rate: 2.167E-05 | global batch size: 32 | lm loss: 6.687614E+00 | loss scale: 16384.0 | grad norm: 107346.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3923/ 159576 | consumed samples: 78288 | elapsed time per iteration (ms): 14567.6 | learning rate: 2.168E-05 | global batch size: 32 | lm loss: 6.764112E+00 | loss scale: 16384.0 | grad norm: 75484.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3924/ 159576 | consumed samples: 78320 | elapsed time per iteration (ms): 14603.6 | learning rate: 2.169E-05 | global batch size: 32 | lm loss: 6.384696E+00 | loss scale: 16384.0 | grad norm: 91570.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3925/ 159576 | consumed samples: 78352 | elapsed time per iteration (ms): 14494.1 | learning rate: 2.170E-05 | global batch size: 32 | lm loss: 6.148740E+00 | loss scale: 16384.0 | grad norm: 66094.874 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3926/ 159576 | consumed samples: 78384 | elapsed time per iteration (ms): 14880.0 | learning rate: 2.171E-05 | global batch size: 32 | lm loss: 6.492467E+00 | loss scale: 16384.0 | grad norm: 95980.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3927/ 159576 | consumed samples: 78416 | elapsed time per iteration (ms): 14529.0 | learning rate: 2.171E-05 | global batch size: 32 | lm loss: 6.634668E+00 | loss scale: 16384.0 | grad norm: 102240.933 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3928/ 159576 | consumed samples: 78448 | elapsed time per iteration (ms): 14524.9 | learning rate: 2.172E-05 | global batch size: 32 | lm loss: 6.542571E+00 | loss scale: 16384.0 | grad norm: 78190.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3929/ 159576 | consumed samples: 78480 | elapsed time per iteration (ms): 14519.9 | learning rate: 2.173E-05 | global batch size: 32 | lm loss: 6.546354E+00 | loss scale: 16384.0 | grad norm: 69181.655 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3930/ 159576 | consumed samples: 78512 | elapsed time per iteration (ms): 14848.7 | learning rate: 2.174E-05 | global batch size: 32 | lm loss: 6.556016E+00 | loss scale: 16384.0 | grad norm: 166890.175 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3931/ 159576 | consumed samples: 78544 | elapsed time per iteration (ms): 14630.3 | learning rate: 2.175E-05 | global batch size: 32 | lm loss: 6.575625E+00 | loss scale: 16384.0 | grad norm: 67026.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3932/ 159576 | consumed samples: 78576 | elapsed time per iteration (ms): 14503.2 | learning rate: 2.176E-05 | global batch size: 32 | lm loss: 6.528583E+00 | loss scale: 16384.0 | grad norm: 65300.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3933/ 159576 | consumed samples: 78608 | elapsed time per iteration (ms): 14533.6 | learning rate: 2.177E-05 | global batch size: 32 | lm loss: 6.571996E+00 | loss scale: 16384.0 | grad norm: 61530.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3934/ 159576 | consumed samples: 78640 | elapsed time per iteration (ms): 14528.2 | learning rate: 2.178E-05 | global batch size: 32 | lm loss: 6.524823E+00 | loss scale: 16384.0 | grad norm: 58107.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3935/ 159576 | consumed samples: 78672 | elapsed time per iteration (ms): 14801.4 | learning rate: 2.179E-05 | global batch size: 32 | lm loss: 6.627916E+00 | loss scale: 16384.0 | grad norm: 64798.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3936/ 159576 | consumed samples: 78704 | elapsed time per iteration (ms): 14509.3 | learning rate: 2.179E-05 | global batch size: 32 | lm loss: 6.511620E+00 | loss scale: 16384.0 | grad norm: 59258.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3937/ 159576 | consumed samples: 78736 | elapsed time per iteration (ms): 14529.7 | learning rate: 2.180E-05 | global batch size: 32 | lm loss: 6.414696E+00 | loss scale: 16384.0 | grad norm: 75598.973 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3938/ 159576 | consumed samples: 78768 | elapsed time per iteration (ms): 14568.6 | learning rate: 2.181E-05 | global batch size: 32 | lm loss: 6.692476E+00 | loss scale: 16384.0 | grad norm: 68594.644 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3939/ 159576 | consumed samples: 78800 | elapsed time per iteration (ms): 14680.0 | learning rate: 2.182E-05 | global batch size: 32 | lm loss: 6.509182E+00 | loss scale: 16384.0 | grad norm: 77431.860 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3940/ 159576 | consumed samples: 78832 | elapsed time per iteration (ms): 14561.3 | learning rate: 2.183E-05 | global batch size: 32 | lm loss: 6.521114E+00 | loss scale: 16384.0 | grad norm: 67107.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3941/ 159576 | consumed samples: 78864 | elapsed time per iteration (ms): 14540.3 | learning rate: 2.184E-05 | global batch size: 32 | lm loss: 6.557777E+00 | loss scale: 16384.0 | grad norm: 82252.980 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3942/ 159576 | consumed samples: 78896 | elapsed time per iteration (ms): 14516.4 | learning rate: 2.185E-05 | global batch size: 32 | lm loss: 6.519272E+00 | loss scale: 16384.0 | grad norm: 62956.678 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3943/ 159576 | consumed samples: 78928 | elapsed time per iteration (ms): 14804.0 | learning rate: 2.186E-05 | global batch size: 32 | lm loss: 6.436077E+00 | loss scale: 16384.0 | grad norm: 63372.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3944/ 159576 | consumed samples: 78960 | elapsed time per iteration (ms): 14504.5 | learning rate: 2.187E-05 | global batch size: 32 | lm loss: 6.536609E+00 | loss scale: 16384.0 | grad norm: 70623.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3945/ 159576 | consumed samples: 78992 | elapsed time per iteration (ms): 14519.8 | learning rate: 2.187E-05 | global batch size: 32 | lm loss: 6.631818E+00 | loss scale: 16384.0 | grad norm: 62267.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3946/ 159576 | consumed samples: 79024 | elapsed time per iteration (ms): 14592.1 | learning rate: 2.188E-05 | global batch size: 32 | lm loss: 6.263665E+00 | loss scale: 16384.0 | grad norm: 67107.842 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3947/ 159576 | consumed samples: 79056 | elapsed time per iteration (ms): 14791.6 | learning rate: 2.189E-05 | global batch size: 32 | lm loss: 6.622372E+00 | loss scale: 16384.0 | grad norm: 84764.799 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3948/ 159576 | consumed samples: 79088 | elapsed time per iteration (ms): 14637.3 | learning rate: 2.190E-05 | global batch size: 32 | lm loss: 6.395759E+00 | loss scale: 16384.0 | grad norm: 60113.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3949/ 159576 | consumed samples: 79120 | elapsed time per iteration (ms): 14546.6 | learning rate: 2.191E-05 | global batch size: 32 | lm loss: 6.588756E+00 | loss scale: 16384.0 | grad norm: 68679.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3950/ 159576 | consumed samples: 79152 | elapsed time per iteration (ms): 14514.6 | learning rate: 2.192E-05 | global batch size: 32 | lm loss: 6.484011E+00 | loss scale: 16384.0 | grad norm: 68729.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3951/ 159576 | consumed samples: 79184 | elapsed time per iteration (ms): 14907.8 | learning rate: 2.193E-05 | global batch size: 32 | lm loss: 6.496289E+00 | loss scale: 16384.0 | grad norm: 58918.789 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3952/ 159576 | consumed samples: 79216 | elapsed time per iteration (ms): 14467.7 | learning rate: 2.194E-05 | global batch size: 32 | lm loss: 6.442475E+00 | loss scale: 16384.0 | grad norm: 73240.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3953/ 159576 | consumed samples: 79248 | elapsed time per iteration (ms): 14613.3 | learning rate: 2.195E-05 | global batch size: 32 | lm loss: 6.412640E+00 | loss scale: 16384.0 | grad norm: 63495.861 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3954/ 159576 | consumed samples: 79280 | elapsed time per iteration (ms): 14497.1 | learning rate: 2.195E-05 | global batch size: 32 | lm loss: 6.419092E+00 | loss scale: 16384.0 | grad norm: 64832.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3955/ 159576 | consumed samples: 79312 | elapsed time per iteration (ms): 14864.8 | learning rate: 2.196E-05 | global batch size: 32 | lm loss: 6.411493E+00 | loss scale: 16384.0 | grad norm: 70227.738 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3956/ 159576 | consumed samples: 79344 | elapsed time per iteration (ms): 14501.1 | learning rate: 2.197E-05 | global batch size: 32 | lm loss: 6.377773E+00 | loss scale: 16384.0 | grad norm: 65521.131 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3957/ 159576 | consumed samples: 79376 | elapsed time per iteration (ms): 14522.7 | learning rate: 2.198E-05 | global batch size: 32 | lm loss: 6.458980E+00 | loss scale: 16384.0 | grad norm: 62294.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3958/ 159576 | consumed samples: 79408 | elapsed time per iteration (ms): 14509.2 | learning rate: 2.199E-05 | global batch size: 32 | lm loss: 6.540348E+00 | loss scale: 16384.0 | grad norm: 64994.102 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3959/ 159576 | consumed samples: 79440 | elapsed time per iteration (ms): 14868.7 | learning rate: 2.200E-05 | global batch size: 32 | lm loss: 6.503858E+00 | loss scale: 16384.0 | grad norm: 54271.909 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3960/ 159576 | consumed samples: 79472 | elapsed time per iteration (ms): 14512.5 | learning rate: 2.201E-05 | global batch size: 32 | lm loss: 6.372645E+00 | loss scale: 16384.0 | grad norm: 73237.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3961/ 159576 | consumed samples: 79504 | elapsed time per iteration (ms): 14552.3 | learning rate: 2.202E-05 | global batch size: 32 | lm loss: 6.396554E+00 | loss scale: 16384.0 | grad norm: 64579.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3962/ 159576 | consumed samples: 79536 | elapsed time per iteration (ms): 14559.3 | learning rate: 2.203E-05 | global batch size: 32 | lm loss: 6.556979E+00 | loss scale: 16384.0 | grad norm: 83489.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3963/ 159576 | consumed samples: 79568 | elapsed time per iteration (ms): 14899.9 | learning rate: 2.203E-05 | global batch size: 32 | lm loss: 6.458327E+00 | loss scale: 16384.0 | grad norm: 58716.823 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3964/ 159576 | consumed samples: 79600 | elapsed time per iteration (ms): 14539.5 | learning rate: 2.204E-05 | global batch size: 32 | lm loss: 6.802517E+00 | loss scale: 16384.0 | grad norm: 60731.153 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3965/ 159576 | consumed samples: 79632 | elapsed time per iteration (ms): 14520.1 | learning rate: 2.205E-05 | global batch size: 32 | lm loss: 6.616902E+00 | loss scale: 16384.0 | grad norm: 64155.719 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3966/ 159576 | consumed samples: 79664 | elapsed time per iteration (ms): 14585.2 | learning rate: 2.206E-05 | global batch size: 32 | lm loss: 6.457995E+00 | loss scale: 16384.0 | grad norm: 74880.971 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3967/ 159576 | consumed samples: 79696 | elapsed time per iteration (ms): 14850.0 | learning rate: 2.207E-05 | global batch size: 32 | lm loss: 6.591904E+00 | loss scale: 16384.0 | grad norm: 75336.614 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3968/ 159576 | consumed samples: 79728 | elapsed time per iteration (ms): 14661.7 | learning rate: 2.208E-05 | global batch size: 32 | lm loss: 6.475752E+00 | loss scale: 16384.0 | grad norm: 76852.677 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3969/ 159576 | consumed samples: 79760 | elapsed time per iteration (ms): 14523.7 | learning rate: 2.209E-05 | global batch size: 32 | lm loss: 6.452621E+00 | loss scale: 16384.0 | grad norm: 65844.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3970/ 159576 | consumed samples: 79792 | elapsed time per iteration (ms): 14549.1 | learning rate: 2.210E-05 | global batch size: 32 | lm loss: 6.401618E+00 | loss scale: 16384.0 | grad norm: 84954.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3971/ 159576 | consumed samples: 79824 | elapsed time per iteration (ms): 14508.8 | learning rate: 2.211E-05 | global batch size: 32 | lm loss: 6.516178E+00 | loss scale: 16384.0 | grad norm: 71111.037 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3972/ 159576 | consumed samples: 79856 | elapsed time per iteration (ms): 14847.5 | learning rate: 2.211E-05 | global batch size: 32 | lm loss: 6.601567E+00 | loss scale: 16384.0 | grad norm: 74563.765 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3973/ 159576 | consumed samples: 79888 | elapsed time per iteration (ms): 14594.0 | learning rate: 2.212E-05 | global batch size: 32 | lm loss: 6.441951E+00 | loss scale: 16384.0 | grad norm: 72653.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3974/ 159576 | consumed samples: 79920 | elapsed time per iteration (ms): 14478.4 | learning rate: 2.213E-05 | global batch size: 32 | lm loss: 6.510294E+00 | loss scale: 16384.0 | grad norm: 65083.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3975/ 159576 | consumed samples: 79952 | elapsed time per iteration (ms): 14520.1 | learning rate: 2.214E-05 | global batch size: 32 | lm loss: 6.345959E+00 | loss scale: 16384.0 | grad norm: 133600.019 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3976/ 159576 | consumed samples: 79984 | elapsed time per iteration (ms): 14770.3 | learning rate: 2.215E-05 | global batch size: 32 | lm loss: 6.477483E+00 | loss scale: 16384.0 | grad norm: 89443.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3977/ 159576 | consumed samples: 80016 | elapsed time per iteration (ms): 14483.7 | learning rate: 2.216E-05 | global batch size: 32 | lm loss: 6.466526E+00 | loss scale: 16384.0 | grad norm: 79203.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3978/ 159576 | consumed samples: 80048 | elapsed time per iteration (ms): 14548.9 | learning rate: 2.217E-05 | global batch size: 32 | lm loss: 6.490917E+00 | loss scale: 16384.0 | grad norm: 85035.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3979/ 159576 | consumed samples: 80080 | elapsed time per iteration (ms): 14519.8 | learning rate: 2.218E-05 | global batch size: 32 | lm loss: 6.412145E+00 | loss scale: 16384.0 | grad norm: 93580.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3980/ 159576 | consumed samples: 80112 | elapsed time per iteration (ms): 14659.7 | learning rate: 2.218E-05 | global batch size: 32 | lm loss: 6.473646E+00 | loss scale: 16384.0 | grad norm: 79422.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3981/ 159576 | consumed samples: 80144 | elapsed time per iteration (ms): 14525.1 | learning rate: 2.219E-05 | global batch size: 32 | lm loss: 6.522334E+00 | loss scale: 16384.0 | grad norm: 83533.865 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3982/ 159576 | consumed samples: 80176 | elapsed time per iteration (ms): 14543.1 | learning rate: 2.220E-05 | global batch size: 32 | lm loss: 6.387228E+00 | loss scale: 16384.0 | grad norm: 89795.957 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3983/ 159576 | consumed samples: 80208 | elapsed time per iteration (ms): 14609.8 | learning rate: 2.221E-05 | global batch size: 32 | lm loss: 6.475267E+00 | loss scale: 16384.0 | grad norm: 119598.589 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3984/ 159576 | consumed samples: 80240 | elapsed time per iteration (ms): 14596.2 | learning rate: 2.222E-05 | global batch size: 32 | lm loss: 6.533351E+00 | loss scale: 16384.0 | grad norm: 72306.036 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3985/ 159576 | consumed samples: 80272 | elapsed time per iteration (ms): 14621.5 | learning rate: 2.223E-05 | global batch size: 32 | lm loss: 6.540237E+00 | loss scale: 16384.0 | grad norm: 88358.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3986/ 159576 | consumed samples: 80304 | elapsed time per iteration (ms): 14563.8 | learning rate: 2.224E-05 | global batch size: 32 | lm loss: 6.419699E+00 | loss scale: 16384.0 | grad norm: 75411.849 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3987/ 159576 | consumed samples: 80336 | elapsed time per iteration (ms): 14555.9 | learning rate: 2.225E-05 | global batch size: 32 | lm loss: 6.591748E+00 | loss scale: 16384.0 | grad norm: 112139.715 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3988/ 159576 | consumed samples: 80368 | elapsed time per iteration (ms): 15004.4 | learning rate: 2.226E-05 | global batch size: 32 | lm loss: 6.551664E+00 | loss scale: 16384.0 | grad norm: 88397.931 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3989/ 159576 | consumed samples: 80400 | elapsed time per iteration (ms): 14610.9 | learning rate: 2.226E-05 | global batch size: 32 | lm loss: 6.531049E+00 | loss scale: 16384.0 | grad norm: 63924.116 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3990/ 159576 | consumed samples: 80432 | elapsed time per iteration (ms): 14532.5 | learning rate: 2.227E-05 | global batch size: 32 | lm loss: 6.546918E+00 | loss scale: 16384.0 | grad norm: 97299.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3991/ 159576 | consumed samples: 80464 | elapsed time per iteration (ms): 14437.4 | learning rate: 2.228E-05 | global batch size: 32 | lm loss: 6.471569E+00 | loss scale: 16384.0 | grad norm: 76326.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3992/ 159576 | consumed samples: 80496 | elapsed time per iteration (ms): 14906.8 | learning rate: 2.229E-05 | global batch size: 32 | lm loss: 6.525407E+00 | loss scale: 16384.0 | grad norm: 77183.511 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3993/ 159576 | consumed samples: 80528 | elapsed time per iteration (ms): 14534.2 | learning rate: 2.230E-05 | global batch size: 32 | lm loss: 6.539597E+00 | loss scale: 16384.0 | grad norm: 60376.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3994/ 159576 | consumed samples: 80560 | elapsed time per iteration (ms): 14579.3 | learning rate: 2.231E-05 | global batch size: 32 | lm loss: 6.552666E+00 | loss scale: 16384.0 | grad norm: 84746.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3995/ 159576 | consumed samples: 80592 | elapsed time per iteration (ms): 14529.3 | learning rate: 2.232E-05 | global batch size: 32 | lm loss: 6.413946E+00 | loss scale: 16384.0 | grad norm: 67969.641 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3996/ 159576 | consumed samples: 80624 | elapsed time per iteration (ms): 14922.8 | learning rate: 2.233E-05 | global batch size: 32 | lm loss: 6.427969E+00 | loss scale: 16384.0 | grad norm: 69649.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3997/ 159576 | consumed samples: 80656 | elapsed time per iteration (ms): 14490.2 | learning rate: 2.234E-05 | global batch size: 32 | lm loss: 6.527985E+00 | loss scale: 16384.0 | grad norm: 71644.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3998/ 159576 | consumed samples: 80688 | elapsed time per iteration (ms): 14523.9 | learning rate: 2.234E-05 | global batch size: 32 | lm loss: 6.565372E+00 | loss scale: 16384.0 | grad norm: 75823.635 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 3999/ 159576 | consumed samples: 80720 | elapsed time per iteration (ms): 14567.0 | learning rate: 2.235E-05 | global batch size: 32 | lm loss: 6.377391E+00 | loss scale: 16384.0 | grad norm: 85967.590 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 17:49:17,238] [INFO] [logging.py:68:log_dist] [Rank 0] step=4000, skipped=6, lr=[2.2362426035502958e-05, 2.2362426035502958e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 4000 loss: 6.3572 iter time (s): 0.007 samples/sec: 4556.985 iteration 4000/ 159576 | consumed samples: 80752 | elapsed time per iteration (ms): 14858.7 | learning rate: 2.236E-05 | global batch size: 32 | lm loss: 6.357164E+00 | loss scale: 16384.0 | grad norm: 56878.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------ validation loss at iteration 4000 | lm loss value: 6.453290E+00 | lm loss PPL: 6.347876E+02 | ------------------------------------------------------------------------------------------------ iteration 4001/ 159576 | consumed samples: 80784 | elapsed time per iteration (ms): 20796.3 | learning rate: 2.237E-05 | global batch size: 32 | lm loss: 6.357805E+00 | loss scale: 16384.0 | grad norm: 75271.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4002/ 159576 | consumed samples: 80816 | elapsed time per iteration (ms): 14528.3 | learning rate: 2.238E-05 | global batch size: 32 | lm loss: 6.590372E+00 | loss scale: 16384.0 | grad norm: 82823.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4003/ 159576 | consumed samples: 80848 | elapsed time per iteration (ms): 14569.0 | learning rate: 2.239E-05 | global batch size: 32 | lm loss: 6.547601E+00 | loss scale: 16384.0 | grad norm: 63495.848 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4004/ 159576 | consumed samples: 80880 | elapsed time per iteration (ms): 14981.7 | learning rate: 2.240E-05 | global batch size: 32 | lm loss: 6.488581E+00 | loss scale: 16384.0 | grad norm: 84538.823 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4005/ 159576 | consumed samples: 80912 | elapsed time per iteration (ms): 14517.6 | learning rate: 2.241E-05 | global batch size: 32 | lm loss: 6.473035E+00 | loss scale: 16384.0 | grad norm: 69154.929 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4006/ 159576 | consumed samples: 80944 | elapsed time per iteration (ms): 14515.3 | learning rate: 2.242E-05 | global batch size: 32 | lm loss: 6.574604E+00 | loss scale: 16384.0 | grad norm: 71258.786 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4007/ 159576 | consumed samples: 80976 | elapsed time per iteration (ms): 14530.3 | learning rate: 2.242E-05 | global batch size: 32 | lm loss: 6.480978E+00 | loss scale: 16384.0 | grad norm: 63598.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4008/ 159576 | consumed samples: 81008 | elapsed time per iteration (ms): 15052.4 | learning rate: 2.243E-05 | global batch size: 32 | lm loss: 6.393389E+00 | loss scale: 16384.0 | grad norm: 76474.916 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4009/ 159576 | consumed samples: 81040 | elapsed time per iteration (ms): 14618.9 | learning rate: 2.244E-05 | global batch size: 32 | lm loss: 6.322450E+00 | loss scale: 16384.0 | grad norm: 62736.146 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4010/ 159576 | consumed samples: 81072 | elapsed time per iteration (ms): 14521.7 | learning rate: 2.245E-05 | global batch size: 32 | lm loss: 6.502364E+00 | loss scale: 16384.0 | grad norm: 78751.861 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4011/ 159576 | consumed samples: 81104 | elapsed time per iteration (ms): 14513.4 | learning rate: 2.246E-05 | global batch size: 32 | lm loss: 6.504915E+00 | loss scale: 16384.0 | grad norm: 73290.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4012/ 159576 | consumed samples: 81136 | elapsed time per iteration (ms): 14859.5 | learning rate: 2.247E-05 | global batch size: 32 | lm loss: 6.422670E+00 | loss scale: 16384.0 | grad norm: 70911.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4013/ 159576 | consumed samples: 81168 | elapsed time per iteration (ms): 14562.7 | learning rate: 2.248E-05 | global batch size: 32 | lm loss: 6.460926E+00 | loss scale: 16384.0 | grad norm: 88361.679 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4014/ 159576 | consumed samples: 81200 | elapsed time per iteration (ms): 14537.6 | learning rate: 2.249E-05 | global batch size: 32 | lm loss: 6.359708E+00 | loss scale: 16384.0 | grad norm: 70950.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4015/ 159576 | consumed samples: 81232 | elapsed time per iteration (ms): 14575.5 | learning rate: 2.250E-05 | global batch size: 32 | lm loss: 6.479752E+00 | loss scale: 16384.0 | grad norm: 60916.908 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4016/ 159576 | consumed samples: 81264 | elapsed time per iteration (ms): 14890.4 | learning rate: 2.250E-05 | global batch size: 32 | lm loss: 6.438080E+00 | loss scale: 16384.0 | grad norm: 78503.860 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4017/ 159576 | consumed samples: 81296 | elapsed time per iteration (ms): 14519.4 | learning rate: 2.251E-05 | global batch size: 32 | lm loss: 6.446492E+00 | loss scale: 16384.0 | grad norm: 66299.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4018/ 159576 | consumed samples: 81328 | elapsed time per iteration (ms): 14512.9 | learning rate: 2.252E-05 | global batch size: 32 | lm loss: 6.418320E+00 | loss scale: 16384.0 | grad norm: 65936.043 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4019/ 159576 | consumed samples: 81360 | elapsed time per iteration (ms): 14568.1 | learning rate: 2.253E-05 | global batch size: 32 | lm loss: 6.337445E+00 | loss scale: 16384.0 | grad norm: 71727.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4020/ 159576 | consumed samples: 81392 | elapsed time per iteration (ms): 14867.3 | learning rate: 2.254E-05 | global batch size: 32 | lm loss: 6.564549E+00 | loss scale: 16384.0 | grad norm: 96122.107 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4021/ 159576 | consumed samples: 81424 | elapsed time per iteration (ms): 14435.4 | learning rate: 2.255E-05 | global batch size: 32 | lm loss: 6.485852E+00 | loss scale: 16384.0 | grad norm: 82597.736 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4022/ 159576 | consumed samples: 81456 | elapsed time per iteration (ms): 14558.0 | learning rate: 2.256E-05 | global batch size: 32 | lm loss: 6.539099E+00 | loss scale: 16384.0 | grad norm: 121006.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4023/ 159576 | consumed samples: 81488 | elapsed time per iteration (ms): 14530.8 | learning rate: 2.257E-05 | global batch size: 32 | lm loss: 6.588836E+00 | loss scale: 16384.0 | grad norm: 83990.530 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4024/ 159576 | consumed samples: 81520 | elapsed time per iteration (ms): 14903.1 | learning rate: 2.258E-05 | global batch size: 32 | lm loss: 6.478038E+00 | loss scale: 16384.0 | grad norm: 86310.728 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4025/ 159576 | consumed samples: 81552 | elapsed time per iteration (ms): 14640.8 | learning rate: 2.258E-05 | global batch size: 32 | lm loss: 6.423618E+00 | loss scale: 16384.0 | grad norm: 72646.553 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4026/ 159576 | consumed samples: 81584 | elapsed time per iteration (ms): 14523.1 | learning rate: 2.259E-05 | global batch size: 32 | lm loss: 6.389876E+00 | loss scale: 16384.0 | grad norm: 75260.682 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4027/ 159576 | consumed samples: 81616 | elapsed time per iteration (ms): 14495.3 | learning rate: 2.260E-05 | global batch size: 32 | lm loss: 6.686980E+00 | loss scale: 16384.0 | grad norm: 68901.893 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4028/ 159576 | consumed samples: 81648 | elapsed time per iteration (ms): 14518.7 | learning rate: 2.261E-05 | global batch size: 32 | lm loss: 6.454273E+00 | loss scale: 16384.0 | grad norm: 78058.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4029/ 159576 | consumed samples: 81680 | elapsed time per iteration (ms): 14751.7 | learning rate: 2.262E-05 | global batch size: 32 | lm loss: 6.645922E+00 | loss scale: 16384.0 | grad norm: 90877.563 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4030/ 159576 | consumed samples: 81712 | elapsed time per iteration (ms): 14605.8 | learning rate: 2.263E-05 | global batch size: 32 | lm loss: 6.554152E+00 | loss scale: 16384.0 | grad norm: 71333.048 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4031/ 159576 | consumed samples: 81744 | elapsed time per iteration (ms): 14567.0 | learning rate: 2.264E-05 | global batch size: 32 | lm loss: 6.512757E+00 | loss scale: 16384.0 | grad norm: 75409.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4032/ 159576 | consumed samples: 81776 | elapsed time per iteration (ms): 14627.7 | learning rate: 2.265E-05 | global batch size: 32 | lm loss: 6.529600E+00 | loss scale: 16384.0 | grad norm: 83852.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4033/ 159576 | consumed samples: 81808 | elapsed time per iteration (ms): 14706.7 | learning rate: 2.266E-05 | global batch size: 32 | lm loss: 6.312231E+00 | loss scale: 16384.0 | grad norm: 64610.818 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4034/ 159576 | consumed samples: 81840 | elapsed time per iteration (ms): 14453.1 | learning rate: 2.266E-05 | global batch size: 32 | lm loss: 6.378237E+00 | loss scale: 16384.0 | grad norm: 70363.183 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4035/ 159576 | consumed samples: 81872 | elapsed time per iteration (ms): 14558.4 | learning rate: 2.267E-05 | global batch size: 32 | lm loss: 6.617406E+00 | loss scale: 16384.0 | grad norm: 76776.869 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4036/ 159576 | consumed samples: 81904 | elapsed time per iteration (ms): 14451.4 | learning rate: 2.268E-05 | global batch size: 32 | lm loss: 6.510260E+00 | loss scale: 16384.0 | grad norm: 65763.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4037/ 159576 | consumed samples: 81936 | elapsed time per iteration (ms): 14734.4 | learning rate: 2.269E-05 | global batch size: 32 | lm loss: 6.484540E+00 | loss scale: 16384.0 | grad norm: 113964.842 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4038/ 159576 | consumed samples: 81968 | elapsed time per iteration (ms): 14560.9 | learning rate: 2.270E-05 | global batch size: 32 | lm loss: 6.422564E+00 | loss scale: 16384.0 | grad norm: 71196.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4039/ 159576 | consumed samples: 82000 | elapsed time per iteration (ms): 14521.4 | learning rate: 2.271E-05 | global batch size: 32 | lm loss: 6.468810E+00 | loss scale: 16384.0 | grad norm: 81464.635 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4040/ 159576 | consumed samples: 82032 | elapsed time per iteration (ms): 14534.9 | learning rate: 2.272E-05 | global batch size: 32 | lm loss: 6.528829E+00 | loss scale: 16384.0 | grad norm: 64883.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4041/ 159576 | consumed samples: 82064 | elapsed time per iteration (ms): 14840.7 | learning rate: 2.273E-05 | global batch size: 32 | lm loss: 6.466451E+00 | loss scale: 16384.0 | grad norm: 113319.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4042/ 159576 | consumed samples: 82096 | elapsed time per iteration (ms): 14627.3 | learning rate: 2.274E-05 | global batch size: 32 | lm loss: 6.455089E+00 | loss scale: 16384.0 | grad norm: 63704.855 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4043/ 159576 | consumed samples: 82128 | elapsed time per iteration (ms): 14401.0 | learning rate: 2.274E-05 | global batch size: 32 | lm loss: 6.394213E+00 | loss scale: 16384.0 | grad norm: 104510.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4044/ 159576 | consumed samples: 82160 | elapsed time per iteration (ms): 14522.2 | learning rate: 2.275E-05 | global batch size: 32 | lm loss: 6.436733E+00 | loss scale: 16384.0 | grad norm: 69916.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4045/ 159576 | consumed samples: 82192 | elapsed time per iteration (ms): 14878.3 | learning rate: 2.276E-05 | global batch size: 32 | lm loss: 6.467334E+00 | loss scale: 16384.0 | grad norm: 86814.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4046/ 159576 | consumed samples: 82224 | elapsed time per iteration (ms): 14619.5 | learning rate: 2.277E-05 | global batch size: 32 | lm loss: 6.542828E+00 | loss scale: 16384.0 | grad norm: 91169.836 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4047/ 159576 | consumed samples: 82256 | elapsed time per iteration (ms): 14546.0 | learning rate: 2.278E-05 | global batch size: 32 | lm loss: 6.482902E+00 | loss scale: 16384.0 | grad norm: 71855.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4048/ 159576 | consumed samples: 82288 | elapsed time per iteration (ms): 14535.3 | learning rate: 2.279E-05 | global batch size: 32 | lm loss: 6.380974E+00 | loss scale: 16384.0 | grad norm: 110448.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4049/ 159576 | consumed samples: 82320 | elapsed time per iteration (ms): 14946.7 | learning rate: 2.280E-05 | global batch size: 32 | lm loss: 6.604033E+00 | loss scale: 16384.0 | grad norm: 86973.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4050/ 159576 | consumed samples: 82352 | elapsed time per iteration (ms): 14452.3 | learning rate: 2.281E-05 | global batch size: 32 | lm loss: 6.485418E+00 | loss scale: 16384.0 | grad norm: 93547.929 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4051/ 159576 | consumed samples: 82384 | elapsed time per iteration (ms): 14486.7 | learning rate: 2.282E-05 | global batch size: 32 | lm loss: 6.447795E+00 | loss scale: 16384.0 | grad norm: 71623.174 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4052/ 159576 | consumed samples: 82416 | elapsed time per iteration (ms): 14546.0 | learning rate: 2.282E-05 | global batch size: 32 | lm loss: 6.490433E+00 | loss scale: 16384.0 | grad norm: 122748.723 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4053/ 159576 | consumed samples: 82448 | elapsed time per iteration (ms): 14923.8 | learning rate: 2.283E-05 | global batch size: 32 | lm loss: 6.393107E+00 | loss scale: 16384.0 | grad norm: 94716.038 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4054/ 159576 | consumed samples: 82480 | elapsed time per iteration (ms): 14522.3 | learning rate: 2.284E-05 | global batch size: 32 | lm loss: 6.560749E+00 | loss scale: 16384.0 | grad norm: 87911.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4055/ 159576 | consumed samples: 82512 | elapsed time per iteration (ms): 14576.1 | learning rate: 2.285E-05 | global batch size: 32 | lm loss: 6.508199E+00 | loss scale: 16384.0 | grad norm: 75712.942 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4056/ 159576 | consumed samples: 82544 | elapsed time per iteration (ms): 14509.2 | learning rate: 2.286E-05 | global batch size: 32 | lm loss: 6.480619E+00 | loss scale: 16384.0 | grad norm: 92968.738 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4057/ 159576 | consumed samples: 82576 | elapsed time per iteration (ms): 14814.4 | learning rate: 2.287E-05 | global batch size: 32 | lm loss: 6.324226E+00 | loss scale: 16384.0 | grad norm: 78472.900 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4058/ 159576 | consumed samples: 82608 | elapsed time per iteration (ms): 14459.3 | learning rate: 2.288E-05 | global batch size: 32 | lm loss: 6.626959E+00 | loss scale: 16384.0 | grad norm: 80531.732 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4059/ 159576 | consumed samples: 82640 | elapsed time per iteration (ms): 14496.4 | learning rate: 2.289E-05 | global batch size: 32 | lm loss: 6.406682E+00 | loss scale: 16384.0 | grad norm: 75308.856 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4060/ 159576 | consumed samples: 82672 | elapsed time per iteration (ms): 14562.2 | learning rate: 2.289E-05 | global batch size: 32 | lm loss: 6.440542E+00 | loss scale: 16384.0 | grad norm: 78114.884 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4061/ 159576 | consumed samples: 82704 | elapsed time per iteration (ms): 14796.0 | learning rate: 2.290E-05 | global batch size: 32 | lm loss: 6.468933E+00 | loss scale: 16384.0 | grad norm: 77154.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4062/ 159576 | consumed samples: 82736 | elapsed time per iteration (ms): 14696.5 | learning rate: 2.291E-05 | global batch size: 32 | lm loss: 6.318196E+00 | loss scale: 16384.0 | grad norm: 97551.121 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4063/ 159576 | consumed samples: 82768 | elapsed time per iteration (ms): 14468.1 | learning rate: 2.292E-05 | global batch size: 32 | lm loss: 6.472930E+00 | loss scale: 16384.0 | grad norm: 110041.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4064/ 159576 | consumed samples: 82800 | elapsed time per iteration (ms): 14496.2 | learning rate: 2.293E-05 | global batch size: 32 | lm loss: 6.523721E+00 | loss scale: 16384.0 | grad norm: 88018.768 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4065/ 159576 | consumed samples: 82832 | elapsed time per iteration (ms): 14563.8 | learning rate: 2.294E-05 | global batch size: 32 | lm loss: 6.453180E+00 | loss scale: 16384.0 | grad norm: 83087.922 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4066/ 159576 | consumed samples: 82864 | elapsed time per iteration (ms): 14884.4 | learning rate: 2.295E-05 | global batch size: 32 | lm loss: 6.447326E+00 | loss scale: 16384.0 | grad norm: 72433.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4067/ 159576 | consumed samples: 82896 | elapsed time per iteration (ms): 14491.5 | learning rate: 2.296E-05 | global batch size: 32 | lm loss: 6.366633E+00 | loss scale: 16384.0 | grad norm: 100504.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4068/ 159576 | consumed samples: 82928 | elapsed time per iteration (ms): 14561.6 | learning rate: 2.297E-05 | global batch size: 32 | lm loss: 6.315294E+00 | loss scale: 16384.0 | grad norm: 79868.630 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4069/ 159576 | consumed samples: 82960 | elapsed time per iteration (ms): 14538.6 | learning rate: 2.297E-05 | global batch size: 32 | lm loss: 6.452709E+00 | loss scale: 16384.0 | grad norm: 94073.963 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4070/ 159576 | consumed samples: 82992 | elapsed time per iteration (ms): 14651.1 | learning rate: 2.298E-05 | global batch size: 32 | lm loss: 6.421084E+00 | loss scale: 16384.0 | grad norm: 96558.906 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4071/ 159576 | consumed samples: 83024 | elapsed time per iteration (ms): 14508.0 | learning rate: 2.299E-05 | global batch size: 32 | lm loss: 6.474918E+00 | loss scale: 16384.0 | grad norm: 104437.791 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4072/ 159576 | consumed samples: 83056 | elapsed time per iteration (ms): 14540.3 | learning rate: 2.300E-05 | global batch size: 32 | lm loss: 6.442264E+00 | loss scale: 16384.0 | grad norm: 69985.883 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 18:07:07] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 18:07:07] PULSE: tr8-104B is running for 12:14:56 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 4073/ 159576 | consumed samples: 83088 | elapsed time per iteration (ms): 14430.9 | learning rate: 2.301E-05 | global batch size: 32 | lm loss: 6.464416E+00 | loss scale: 16384.0 | grad norm: 92935.764 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4074/ 159576 | consumed samples: 83120 | elapsed time per iteration (ms): 14595.5 | learning rate: 2.302E-05 | global batch size: 32 | lm loss: 6.394172E+00 | loss scale: 16384.0 | grad norm: 93727.497 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4075/ 159576 | consumed samples: 83152 | elapsed time per iteration (ms): 14478.6 | learning rate: 2.303E-05 | global batch size: 32 | lm loss: 6.535138E+00 | loss scale: 16384.0 | grad norm: 110910.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4076/ 159576 | consumed samples: 83184 | elapsed time per iteration (ms): 14559.7 | learning rate: 2.304E-05 | global batch size: 32 | lm loss: 6.459756E+00 | loss scale: 16384.0 | grad norm: 79798.141 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4077/ 159576 | consumed samples: 83216 | elapsed time per iteration (ms): 14529.0 | learning rate: 2.305E-05 | global batch size: 32 | lm loss: 6.388766E+00 | loss scale: 16384.0 | grad norm: 80153.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4078/ 159576 | consumed samples: 83248 | elapsed time per iteration (ms): 15028.3 | learning rate: 2.305E-05 | global batch size: 32 | lm loss: 6.462305E+00 | loss scale: 16384.0 | grad norm: 72541.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4079/ 159576 | consumed samples: 83280 | elapsed time per iteration (ms): 14501.7 | learning rate: 2.306E-05 | global batch size: 32 | lm loss: 6.606649E+00 | loss scale: 16384.0 | grad norm: 72682.132 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4080/ 159576 | consumed samples: 83312 | elapsed time per iteration (ms): 14478.7 | learning rate: 2.307E-05 | global batch size: 32 | lm loss: 6.339183E+00 | loss scale: 16384.0 | grad norm: 77952.104 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4081/ 159576 | consumed samples: 83344 | elapsed time per iteration (ms): 14534.3 | learning rate: 2.308E-05 | global batch size: 32 | lm loss: 6.482682E+00 | loss scale: 16384.0 | grad norm: 78541.532 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4082/ 159576 | consumed samples: 83376 | elapsed time per iteration (ms): 14971.6 | learning rate: 2.309E-05 | global batch size: 32 | lm loss: 6.464870E+00 | loss scale: 16384.0 | grad norm: 82812.736 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4083/ 159576 | consumed samples: 83408 | elapsed time per iteration (ms): 14619.1 | learning rate: 2.310E-05 | global batch size: 32 | lm loss: 6.468065E+00 | loss scale: 16384.0 | grad norm: 95549.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4084/ 159576 | consumed samples: 83440 | elapsed time per iteration (ms): 14580.8 | learning rate: 2.311E-05 | global batch size: 32 | lm loss: 6.390970E+00 | loss scale: 16384.0 | grad norm: 76775.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4085/ 159576 | consumed samples: 83472 | elapsed time per iteration (ms): 14597.4 | learning rate: 2.312E-05 | global batch size: 32 | lm loss: 6.441597E+00 | loss scale: 16384.0 | grad norm: 87885.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4086/ 159576 | consumed samples: 83504 | elapsed time per iteration (ms): 14827.9 | learning rate: 2.313E-05 | global batch size: 32 | lm loss: 6.332308E+00 | loss scale: 16384.0 | grad norm: 67530.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4087/ 159576 | consumed samples: 83536 | elapsed time per iteration (ms): 14496.3 | learning rate: 2.313E-05 | global batch size: 32 | lm loss: 6.360069E+00 | loss scale: 16384.0 | grad norm: 65277.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4088/ 159576 | consumed samples: 83568 | elapsed time per iteration (ms): 14505.1 | learning rate: 2.314E-05 | global batch size: 32 | lm loss: 6.331870E+00 | loss scale: 16384.0 | grad norm: 73276.808 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4089/ 159576 | consumed samples: 83600 | elapsed time per iteration (ms): 14518.3 | learning rate: 2.315E-05 | global batch size: 32 | lm loss: 6.279953E+00 | loss scale: 16384.0 | grad norm: 69193.657 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4090/ 159576 | consumed samples: 83632 | elapsed time per iteration (ms): 14816.9 | learning rate: 2.316E-05 | global batch size: 32 | lm loss: 6.473932E+00 | loss scale: 16384.0 | grad norm: 78838.749 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4091/ 159576 | consumed samples: 83664 | elapsed time per iteration (ms): 14589.1 | learning rate: 2.317E-05 | global batch size: 32 | lm loss: 6.346605E+00 | loss scale: 16384.0 | grad norm: 76401.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4092/ 159576 | consumed samples: 83696 | elapsed time per iteration (ms): 14611.5 | learning rate: 2.318E-05 | global batch size: 32 | lm loss: 6.444325E+00 | loss scale: 16384.0 | grad norm: 85411.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4093/ 159576 | consumed samples: 83728 | elapsed time per iteration (ms): 14540.2 | learning rate: 2.319E-05 | global batch size: 32 | lm loss: 6.498468E+00 | loss scale: 16384.0 | grad norm: 97013.865 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4094/ 159576 | consumed samples: 83760 | elapsed time per iteration (ms): 14934.5 | learning rate: 2.320E-05 | global batch size: 32 | lm loss: 6.368524E+00 | loss scale: 16384.0 | grad norm: 75310.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4095/ 159576 | consumed samples: 83792 | elapsed time per iteration (ms): 14479.4 | learning rate: 2.321E-05 | global batch size: 32 | lm loss: 6.445729E+00 | loss scale: 16384.0 | grad norm: 79666.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4096/ 159576 | consumed samples: 83824 | elapsed time per iteration (ms): 14539.3 | learning rate: 2.321E-05 | global batch size: 32 | lm loss: 6.478226E+00 | loss scale: 16384.0 | grad norm: 74953.641 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4097/ 159576 | consumed samples: 83856 | elapsed time per iteration (ms): 14544.9 | learning rate: 2.322E-05 | global batch size: 32 | lm loss: 6.494800E+00 | loss scale: 16384.0 | grad norm: 83444.792 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4098/ 159576 | consumed samples: 83888 | elapsed time per iteration (ms): 14987.3 | learning rate: 2.323E-05 | global batch size: 32 | lm loss: 6.549989E+00 | loss scale: 16384.0 | grad norm: 73065.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4099/ 159576 | consumed samples: 83920 | elapsed time per iteration (ms): 14510.7 | learning rate: 2.324E-05 | global batch size: 32 | lm loss: 6.523539E+00 | loss scale: 16384.0 | grad norm: 83625.749 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4100/ 159576 | consumed samples: 83952 | elapsed time per iteration (ms): 14610.5 | learning rate: 2.325E-05 | global batch size: 32 | lm loss: 6.451036E+00 | loss scale: 16384.0 | grad norm: 74563.493 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4101/ 159576 | consumed samples: 83984 | elapsed time per iteration (ms): 14604.4 | learning rate: 2.326E-05 | global batch size: 32 | lm loss: 6.472479E+00 | loss scale: 16384.0 | grad norm: 109783.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4102/ 159576 | consumed samples: 84016 | elapsed time per iteration (ms): 14804.2 | learning rate: 2.327E-05 | global batch size: 32 | lm loss: 6.392324E+00 | loss scale: 16384.0 | grad norm: 77708.767 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4103/ 159576 | consumed samples: 84048 | elapsed time per iteration (ms): 14666.7 | learning rate: 2.328E-05 | global batch size: 32 | lm loss: 6.388014E+00 | loss scale: 16384.0 | grad norm: 72228.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4104/ 159576 | consumed samples: 84080 | elapsed time per iteration (ms): 14567.0 | learning rate: 2.329E-05 | global batch size: 32 | lm loss: 6.351237E+00 | loss scale: 16384.0 | grad norm: 75762.926 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4105/ 159576 | consumed samples: 84112 | elapsed time per iteration (ms): 14512.3 | learning rate: 2.329E-05 | global batch size: 32 | lm loss: 6.445687E+00 | loss scale: 16384.0 | grad norm: 71985.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4106/ 159576 | consumed samples: 84144 | elapsed time per iteration (ms): 14555.0 | learning rate: 2.330E-05 | global batch size: 32 | lm loss: 6.450569E+00 | loss scale: 16384.0 | grad norm: 70873.734 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4107/ 159576 | consumed samples: 84176 | elapsed time per iteration (ms): 14836.4 | learning rate: 2.331E-05 | global batch size: 32 | lm loss: 6.490268E+00 | loss scale: 16384.0 | grad norm: 62324.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4108/ 159576 | consumed samples: 84208 | elapsed time per iteration (ms): 14607.5 | learning rate: 2.332E-05 | global batch size: 32 | lm loss: 6.503112E+00 | loss scale: 16384.0 | grad norm: 80147.648 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4109/ 159576 | consumed samples: 84240 | elapsed time per iteration (ms): 14516.1 | learning rate: 2.333E-05 | global batch size: 32 | lm loss: 6.575756E+00 | loss scale: 16384.0 | grad norm: 85277.958 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4110/ 159576 | consumed samples: 84272 | elapsed time per iteration (ms): 14534.3 | learning rate: 2.334E-05 | global batch size: 32 | lm loss: 6.521991E+00 | loss scale: 16384.0 | grad norm: 88147.911 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4111/ 159576 | consumed samples: 84304 | elapsed time per iteration (ms): 14643.4 | learning rate: 2.335E-05 | global batch size: 32 | lm loss: 6.583647E+00 | loss scale: 16384.0 | grad norm: 90470.119 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4112/ 159576 | consumed samples: 84336 | elapsed time per iteration (ms): 14501.6 | learning rate: 2.336E-05 | global batch size: 32 | lm loss: 6.307788E+00 | loss scale: 16384.0 | grad norm: 84679.029 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4113/ 159576 | consumed samples: 84368 | elapsed time per iteration (ms): 14565.5 | learning rate: 2.337E-05 | global batch size: 32 | lm loss: 6.392709E+00 | loss scale: 16384.0 | grad norm: 85222.050 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4114/ 159576 | consumed samples: 84400 | elapsed time per iteration (ms): 14580.4 | learning rate: 2.337E-05 | global batch size: 32 | lm loss: 6.384982E+00 | loss scale: 16384.0 | grad norm: 101932.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4115/ 159576 | consumed samples: 84432 | elapsed time per iteration (ms): 14793.7 | learning rate: 2.338E-05 | global batch size: 32 | lm loss: 6.402984E+00 | loss scale: 16384.0 | grad norm: 80725.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4116/ 159576 | consumed samples: 84464 | elapsed time per iteration (ms): 14599.8 | learning rate: 2.339E-05 | global batch size: 32 | lm loss: 6.431032E+00 | loss scale: 16384.0 | grad norm: 88365.957 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4117/ 159576 | consumed samples: 84496 | elapsed time per iteration (ms): 14529.0 | learning rate: 2.340E-05 | global batch size: 32 | lm loss: 6.544386E+00 | loss scale: 16384.0 | grad norm: 94647.177 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4118/ 159576 | consumed samples: 84528 | elapsed time per iteration (ms): 14520.8 | learning rate: 2.341E-05 | global batch size: 32 | lm loss: 6.494756E+00 | loss scale: 16384.0 | grad norm: 127914.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4119/ 159576 | consumed samples: 84560 | elapsed time per iteration (ms): 14810.4 | learning rate: 2.342E-05 | global batch size: 32 | lm loss: 6.676927E+00 | loss scale: 16384.0 | grad norm: 255152.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4120/ 159576 | consumed samples: 84592 | elapsed time per iteration (ms): 14553.6 | learning rate: 2.343E-05 | global batch size: 32 | lm loss: 6.521421E+00 | loss scale: 16384.0 | grad norm: 88738.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4121/ 159576 | consumed samples: 84624 | elapsed time per iteration (ms): 14615.1 | learning rate: 2.344E-05 | global batch size: 32 | lm loss: 6.422895E+00 | loss scale: 16384.0 | grad norm: 69394.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4122/ 159576 | consumed samples: 84656 | elapsed time per iteration (ms): 14526.7 | learning rate: 2.345E-05 | global batch size: 32 | lm loss: 6.391778E+00 | loss scale: 16384.0 | grad norm: 75006.078 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4123/ 159576 | consumed samples: 84688 | elapsed time per iteration (ms): 14981.6 | learning rate: 2.345E-05 | global batch size: 32 | lm loss: 6.569616E+00 | loss scale: 16384.0 | grad norm: 89357.812 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4124/ 159576 | consumed samples: 84720 | elapsed time per iteration (ms): 14751.3 | learning rate: 2.346E-05 | global batch size: 32 | lm loss: 6.522147E+00 | loss scale: 16384.0 | grad norm: 83006.179 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4125/ 159576 | consumed samples: 84752 | elapsed time per iteration (ms): 14464.7 | learning rate: 2.347E-05 | global batch size: 32 | lm loss: 6.443343E+00 | loss scale: 16384.0 | grad norm: 85692.827 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4126/ 159576 | consumed samples: 84784 | elapsed time per iteration (ms): 14544.8 | learning rate: 2.348E-05 | global batch size: 32 | lm loss: 6.447396E+00 | loss scale: 16384.0 | grad norm: 75026.495 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4127/ 159576 | consumed samples: 84816 | elapsed time per iteration (ms): 14837.3 | learning rate: 2.349E-05 | global batch size: 32 | lm loss: 6.407457E+00 | loss scale: 16384.0 | grad norm: 68031.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4128/ 159576 | consumed samples: 84848 | elapsed time per iteration (ms): 14497.8 | learning rate: 2.350E-05 | global batch size: 32 | lm loss: 6.509037E+00 | loss scale: 16384.0 | grad norm: 81823.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4129/ 159576 | consumed samples: 84880 | elapsed time per iteration (ms): 14560.1 | learning rate: 2.351E-05 | global batch size: 32 | lm loss: 6.349816E+00 | loss scale: 16384.0 | grad norm: 72346.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4130/ 159576 | consumed samples: 84912 | elapsed time per iteration (ms): 14548.5 | learning rate: 2.352E-05 | global batch size: 32 | lm loss: 6.479569E+00 | loss scale: 16384.0 | grad norm: 87336.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4131/ 159576 | consumed samples: 84944 | elapsed time per iteration (ms): 14910.1 | learning rate: 2.353E-05 | global batch size: 32 | lm loss: 6.617517E+00 | loss scale: 16384.0 | grad norm: 86374.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4132/ 159576 | consumed samples: 84976 | elapsed time per iteration (ms): 14494.2 | learning rate: 2.353E-05 | global batch size: 32 | lm loss: 6.465295E+00 | loss scale: 16384.0 | grad norm: 84022.738 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4133/ 159576 | consumed samples: 85008 | elapsed time per iteration (ms): 14507.6 | learning rate: 2.354E-05 | global batch size: 32 | lm loss: 6.496157E+00 | loss scale: 16384.0 | grad norm: 84787.804 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4134/ 159576 | consumed samples: 85040 | elapsed time per iteration (ms): 14524.7 | learning rate: 2.355E-05 | global batch size: 32 | lm loss: 6.413724E+00 | loss scale: 16384.0 | grad norm: 85852.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4135/ 159576 | consumed samples: 85072 | elapsed time per iteration (ms): 14838.8 | learning rate: 2.356E-05 | global batch size: 32 | lm loss: 6.625166E+00 | loss scale: 16384.0 | grad norm: 94635.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4136/ 159576 | consumed samples: 85104 | elapsed time per iteration (ms): 14542.4 | learning rate: 2.357E-05 | global batch size: 32 | lm loss: 6.407034E+00 | loss scale: 16384.0 | grad norm: 84861.680 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4137/ 159576 | consumed samples: 85136 | elapsed time per iteration (ms): 14613.1 | learning rate: 2.358E-05 | global batch size: 32 | lm loss: 6.522691E+00 | loss scale: 16384.0 | grad norm: 90819.589 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4138/ 159576 | consumed samples: 85168 | elapsed time per iteration (ms): 14588.1 | learning rate: 2.359E-05 | global batch size: 32 | lm loss: 6.515704E+00 | loss scale: 16384.0 | grad norm: 84641.662 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4139/ 159576 | consumed samples: 85200 | elapsed time per iteration (ms): 14775.7 | learning rate: 2.360E-05 | global batch size: 32 | lm loss: 6.462790E+00 | loss scale: 16384.0 | grad norm: 109335.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4140/ 159576 | consumed samples: 85232 | elapsed time per iteration (ms): 14632.9 | learning rate: 2.361E-05 | global batch size: 32 | lm loss: 6.565165E+00 | loss scale: 16384.0 | grad norm: 101408.740 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4141/ 159576 | consumed samples: 85264 | elapsed time per iteration (ms): 14488.2 | learning rate: 2.361E-05 | global batch size: 32 | lm loss: 6.378877E+00 | loss scale: 16384.0 | grad norm: 85177.703 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4142/ 159576 | consumed samples: 85296 | elapsed time per iteration (ms): 14538.0 | learning rate: 2.362E-05 | global batch size: 32 | lm loss: 6.464640E+00 | loss scale: 16384.0 | grad norm: 107413.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4143/ 159576 | consumed samples: 85328 | elapsed time per iteration (ms): 14656.2 | learning rate: 2.363E-05 | global batch size: 32 | lm loss: 6.672103E+00 | loss scale: 16384.0 | grad norm: 79187.829 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4144/ 159576 | consumed samples: 85360 | elapsed time per iteration (ms): 14916.7 | learning rate: 2.364E-05 | global batch size: 32 | lm loss: 6.691429E+00 | loss scale: 16384.0 | grad norm: 105292.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4145/ 159576 | consumed samples: 85392 | elapsed time per iteration (ms): 14496.1 | learning rate: 2.365E-05 | global batch size: 32 | lm loss: 6.428411E+00 | loss scale: 16384.0 | grad norm: 81232.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4146/ 159576 | consumed samples: 85424 | elapsed time per iteration (ms): 14532.5 | learning rate: 2.366E-05 | global batch size: 32 | lm loss: 6.483904E+00 | loss scale: 16384.0 | grad norm: 117143.742 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4147/ 159576 | consumed samples: 85456 | elapsed time per iteration (ms): 14531.1 | learning rate: 2.367E-05 | global batch size: 32 | lm loss: 6.363456E+00 | loss scale: 16384.0 | grad norm: 88860.011 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4148/ 159576 | consumed samples: 85488 | elapsed time per iteration (ms): 14766.7 | learning rate: 2.368E-05 | global batch size: 32 | lm loss: 6.523079E+00 | loss scale: 16384.0 | grad norm: 87677.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4149/ 159576 | consumed samples: 85520 | elapsed time per iteration (ms): 14507.2 | learning rate: 2.368E-05 | global batch size: 32 | lm loss: 6.553520E+00 | loss scale: 16384.0 | grad norm: 121742.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4150/ 159576 | consumed samples: 85552 | elapsed time per iteration (ms): 14548.6 | learning rate: 2.369E-05 | global batch size: 32 | lm loss: 6.490498E+00 | loss scale: 16384.0 | grad norm: 89599.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4151/ 159576 | consumed samples: 85584 | elapsed time per iteration (ms): 14535.8 | learning rate: 2.370E-05 | global batch size: 32 | lm loss: 6.498284E+00 | loss scale: 16384.0 | grad norm: 103857.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4152/ 159576 | consumed samples: 85616 | elapsed time per iteration (ms): 14637.7 | learning rate: 2.371E-05 | global batch size: 32 | lm loss: 6.607250E+00 | loss scale: 16384.0 | grad norm: 80792.955 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4153/ 159576 | consumed samples: 85648 | elapsed time per iteration (ms): 14584.8 | learning rate: 2.372E-05 | global batch size: 32 | lm loss: 6.465719E+00 | loss scale: 16384.0 | grad norm: 76852.004 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4154/ 159576 | consumed samples: 85680 | elapsed time per iteration (ms): 14575.3 | learning rate: 2.373E-05 | global batch size: 32 | lm loss: 6.475266E+00 | loss scale: 16384.0 | grad norm: 87775.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4155/ 159576 | consumed samples: 85712 | elapsed time per iteration (ms): 14452.5 | learning rate: 2.374E-05 | global batch size: 32 | lm loss: 6.456027E+00 | loss scale: 16384.0 | grad norm: 75377.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4156/ 159576 | consumed samples: 85744 | elapsed time per iteration (ms): 14769.4 | learning rate: 2.375E-05 | global batch size: 32 | lm loss: 6.436621E+00 | loss scale: 16384.0 | grad norm: 86270.120 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4157/ 159576 | consumed samples: 85776 | elapsed time per iteration (ms): 14484.6 | learning rate: 2.376E-05 | global batch size: 32 | lm loss: 6.502521E+00 | loss scale: 16384.0 | grad norm: 77291.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4158/ 159576 | consumed samples: 85808 | elapsed time per iteration (ms): 14605.4 | learning rate: 2.376E-05 | global batch size: 32 | lm loss: 6.271915E+00 | loss scale: 16384.0 | grad norm: 79782.510 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4159/ 159576 | consumed samples: 85840 | elapsed time per iteration (ms): 14468.5 | learning rate: 2.377E-05 | global batch size: 32 | lm loss: 6.375775E+00 | loss scale: 16384.0 | grad norm: 91679.045 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4160/ 159576 | consumed samples: 85872 | elapsed time per iteration (ms): 15055.2 | learning rate: 2.378E-05 | global batch size: 32 | lm loss: 6.207356E+00 | loss scale: 16384.0 | grad norm: 84700.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4161/ 159576 | consumed samples: 85904 | elapsed time per iteration (ms): 14639.9 | learning rate: 2.379E-05 | global batch size: 32 | lm loss: 6.385208E+00 | loss scale: 16384.0 | grad norm: 77383.793 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4162/ 159576 | consumed samples: 85936 | elapsed time per iteration (ms): 14461.5 | learning rate: 2.380E-05 | global batch size: 32 | lm loss: 6.480938E+00 | loss scale: 16384.0 | grad norm: 98154.860 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4163/ 159576 | consumed samples: 85968 | elapsed time per iteration (ms): 14557.2 | learning rate: 2.381E-05 | global batch size: 32 | lm loss: 6.427241E+00 | loss scale: 16384.0 | grad norm: 79663.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4164/ 159576 | consumed samples: 86000 | elapsed time per iteration (ms): 15046.3 | learning rate: 2.382E-05 | global batch size: 32 | lm loss: 6.310709E+00 | loss scale: 16384.0 | grad norm: 76469.866 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4165/ 159576 | consumed samples: 86032 | elapsed time per iteration (ms): 14517.1 | learning rate: 2.383E-05 | global batch size: 32 | lm loss: 6.597423E+00 | loss scale: 16384.0 | grad norm: 95179.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4166/ 159576 | consumed samples: 86064 | elapsed time per iteration (ms): 14562.4 | learning rate: 2.384E-05 | global batch size: 32 | lm loss: 6.398317E+00 | loss scale: 16384.0 | grad norm: 86889.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4167/ 159576 | consumed samples: 86096 | elapsed time per iteration (ms): 14577.1 | learning rate: 2.384E-05 | global batch size: 32 | lm loss: 6.447660E+00 | loss scale: 16384.0 | grad norm: 99510.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4168/ 159576 | consumed samples: 86128 | elapsed time per iteration (ms): 14813.0 | learning rate: 2.385E-05 | global batch size: 32 | lm loss: 6.528482E+00 | loss scale: 16384.0 | grad norm: 83413.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4169/ 159576 | consumed samples: 86160 | elapsed time per iteration (ms): 14589.9 | learning rate: 2.386E-05 | global batch size: 32 | lm loss: 6.388697E+00 | loss scale: 16384.0 | grad norm: 76722.933 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4170/ 159576 | consumed samples: 86192 | elapsed time per iteration (ms): 14519.5 | learning rate: 2.387E-05 | global batch size: 32 | lm loss: 6.446240E+00 | loss scale: 16384.0 | grad norm: 85947.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4171/ 159576 | consumed samples: 86224 | elapsed time per iteration (ms): 14524.6 | learning rate: 2.388E-05 | global batch size: 32 | lm loss: 6.425363E+00 | loss scale: 16384.0 | grad norm: 88474.007 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4172/ 159576 | consumed samples: 86256 | elapsed time per iteration (ms): 14879.2 | learning rate: 2.389E-05 | global batch size: 32 | lm loss: 6.515138E+00 | loss scale: 16384.0 | grad norm: 108134.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4173/ 159576 | consumed samples: 86288 | elapsed time per iteration (ms): 14582.3 | learning rate: 2.390E-05 | global batch size: 32 | lm loss: 6.533965E+00 | loss scale: 16384.0 | grad norm: 76749.086 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4174/ 159576 | consumed samples: 86320 | elapsed time per iteration (ms): 14543.3 | learning rate: 2.391E-05 | global batch size: 32 | lm loss: 6.448212E+00 | loss scale: 16384.0 | grad norm: 93972.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4175/ 159576 | consumed samples: 86352 | elapsed time per iteration (ms): 14572.0 | learning rate: 2.392E-05 | global batch size: 32 | lm loss: 6.440217E+00 | loss scale: 16384.0 | grad norm: 102291.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4176/ 159576 | consumed samples: 86384 | elapsed time per iteration (ms): 14897.3 | learning rate: 2.392E-05 | global batch size: 32 | lm loss: 6.324600E+00 | loss scale: 16384.0 | grad norm: 81057.900 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4177/ 159576 | consumed samples: 86416 | elapsed time per iteration (ms): 14575.9 | learning rate: 2.393E-05 | global batch size: 32 | lm loss: 6.564878E+00 | loss scale: 16384.0 | grad norm: 96270.150 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4178/ 159576 | consumed samples: 86448 | elapsed time per iteration (ms): 14585.7 | learning rate: 2.394E-05 | global batch size: 32 | lm loss: 6.473108E+00 | loss scale: 16384.0 | grad norm: 80498.059 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4179/ 159576 | consumed samples: 86480 | elapsed time per iteration (ms): 14517.6 | learning rate: 2.395E-05 | global batch size: 32 | lm loss: 6.519761E+00 | loss scale: 16384.0 | grad norm: 90509.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4180/ 159576 | consumed samples: 86512 | elapsed time per iteration (ms): 14895.7 | learning rate: 2.396E-05 | global batch size: 32 | lm loss: 6.377243E+00 | loss scale: 16384.0 | grad norm: 92370.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4181/ 159576 | consumed samples: 86544 | elapsed time per iteration (ms): 14690.0 | learning rate: 2.397E-05 | global batch size: 32 | lm loss: 6.469300E+00 | loss scale: 16384.0 | grad norm: 89492.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4182/ 159576 | consumed samples: 86576 | elapsed time per iteration (ms): 14557.6 | learning rate: 2.398E-05 | global batch size: 32 | lm loss: 6.497668E+00 | loss scale: 16384.0 | grad norm: 104899.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4183/ 159576 | consumed samples: 86608 | elapsed time per iteration (ms): 14588.2 | learning rate: 2.399E-05 | global batch size: 32 | lm loss: 6.412446E+00 | loss scale: 16384.0 | grad norm: 81267.948 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4184/ 159576 | consumed samples: 86640 | elapsed time per iteration (ms): 14486.7 | learning rate: 2.400E-05 | global batch size: 32 | lm loss: 6.486274E+00 | loss scale: 16384.0 | grad norm: 95404.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4185/ 159576 | consumed samples: 86672 | elapsed time per iteration (ms): 14942.6 | learning rate: 2.400E-05 | global batch size: 32 | lm loss: 6.375100E+00 | loss scale: 16384.0 | grad norm: 82372.004 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4186/ 159576 | consumed samples: 86704 | elapsed time per iteration (ms): 14540.4 | learning rate: 2.401E-05 | global batch size: 32 | lm loss: 6.444688E+00 | loss scale: 16384.0 | grad norm: 102268.468 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4187/ 159576 | consumed samples: 86736 | elapsed time per iteration (ms): 14530.9 | learning rate: 2.402E-05 | global batch size: 32 | lm loss: 6.270885E+00 | loss scale: 16384.0 | grad norm: 85114.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4188/ 159576 | consumed samples: 86768 | elapsed time per iteration (ms): 14554.4 | learning rate: 2.403E-05 | global batch size: 32 | lm loss: 6.461191E+00 | loss scale: 16384.0 | grad norm: 82795.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4189/ 159576 | consumed samples: 86800 | elapsed time per iteration (ms): 14680.7 | learning rate: 2.404E-05 | global batch size: 32 | lm loss: 6.483377E+00 | loss scale: 16384.0 | grad norm: 106142.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4190/ 159576 | consumed samples: 86832 | elapsed time per iteration (ms): 14652.1 | learning rate: 2.405E-05 | global batch size: 32 | lm loss: 6.468819E+00 | loss scale: 16384.0 | grad norm: 83557.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4191/ 159576 | consumed samples: 86864 | elapsed time per iteration (ms): 14459.3 | learning rate: 2.406E-05 | global batch size: 32 | lm loss: 6.379012E+00 | loss scale: 16384.0 | grad norm: 90619.727 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4192/ 159576 | consumed samples: 86896 | elapsed time per iteration (ms): 14539.1 | learning rate: 2.407E-05 | global batch size: 32 | lm loss: 6.459314E+00 | loss scale: 16384.0 | grad norm: 94282.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4193/ 159576 | consumed samples: 86928 | elapsed time per iteration (ms): 14715.7 | learning rate: 2.408E-05 | global batch size: 32 | lm loss: 6.435170E+00 | loss scale: 16384.0 | grad norm: 92946.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4194/ 159576 | consumed samples: 86960 | elapsed time per iteration (ms): 14501.7 | learning rate: 2.408E-05 | global batch size: 32 | lm loss: 6.419791E+00 | loss scale: 16384.0 | grad norm: 78251.108 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4195/ 159576 | consumed samples: 86992 | elapsed time per iteration (ms): 14523.0 | learning rate: 2.409E-05 | global batch size: 32 | lm loss: 6.342591E+00 | loss scale: 16384.0 | grad norm: 80571.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4196/ 159576 | consumed samples: 87024 | elapsed time per iteration (ms): 14595.3 | learning rate: 2.410E-05 | global batch size: 32 | lm loss: 6.373145E+00 | loss scale: 16384.0 | grad norm: 106409.932 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4197/ 159576 | consumed samples: 87056 | elapsed time per iteration (ms): 14737.5 | learning rate: 2.411E-05 | global batch size: 32 | lm loss: 6.543087E+00 | loss scale: 16384.0 | grad norm: 81359.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4198/ 159576 | consumed samples: 87088 | elapsed time per iteration (ms): 14570.3 | learning rate: 2.412E-05 | global batch size: 32 | lm loss: 6.555972E+00 | loss scale: 16384.0 | grad norm: 101442.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4199/ 159576 | consumed samples: 87120 | elapsed time per iteration (ms): 14518.0 | learning rate: 2.413E-05 | global batch size: 32 | lm loss: 6.497987E+00 | loss scale: 16384.0 | grad norm: 87789.780 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4200/ 159576 | consumed samples: 87152 | elapsed time per iteration (ms): 14561.0 | learning rate: 2.414E-05 | global batch size: 32 | lm loss: 6.526636E+00 | loss scale: 16384.0 | grad norm: 97375.608 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4201/ 159576 | consumed samples: 87184 | elapsed time per iteration (ms): 14967.8 | learning rate: 2.415E-05 | global batch size: 32 | lm loss: 6.529594E+00 | loss scale: 16384.0 | grad norm: 98056.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4202/ 159576 | consumed samples: 87216 | elapsed time per iteration (ms): 14591.5 | learning rate: 2.416E-05 | global batch size: 32 | lm loss: 6.461559E+00 | loss scale: 16384.0 | grad norm: 103248.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4203/ 159576 | consumed samples: 87248 | elapsed time per iteration (ms): 14557.3 | learning rate: 2.416E-05 | global batch size: 32 | lm loss: 6.255905E+00 | loss scale: 16384.0 | grad norm: 98489.984 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4204/ 159576 | consumed samples: 87280 | elapsed time per iteration (ms): 14539.8 | learning rate: 2.417E-05 | global batch size: 32 | lm loss: 6.456792E+00 | loss scale: 16384.0 | grad norm: 90220.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4205/ 159576 | consumed samples: 87312 | elapsed time per iteration (ms): 14936.2 | learning rate: 2.418E-05 | global batch size: 32 | lm loss: 6.456956E+00 | loss scale: 16384.0 | grad norm: 99591.028 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4206/ 159576 | consumed samples: 87344 | elapsed time per iteration (ms): 14602.1 | learning rate: 2.419E-05 | global batch size: 32 | lm loss: 6.539675E+00 | loss scale: 16384.0 | grad norm: 106461.971 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4207/ 159576 | consumed samples: 87376 | elapsed time per iteration (ms): 14518.5 | learning rate: 2.420E-05 | global batch size: 32 | lm loss: 6.581583E+00 | loss scale: 16384.0 | grad norm: 104474.944 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4208/ 159576 | consumed samples: 87408 | elapsed time per iteration (ms): 14546.2 | learning rate: 2.421E-05 | global batch size: 32 | lm loss: 6.470299E+00 | loss scale: 16384.0 | grad norm: 103936.744 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4209/ 159576 | consumed samples: 87440 | elapsed time per iteration (ms): 14895.0 | learning rate: 2.422E-05 | global batch size: 32 | lm loss: 6.485046E+00 | loss scale: 16384.0 | grad norm: 103480.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4210/ 159576 | consumed samples: 87472 | elapsed time per iteration (ms): 14490.7 | learning rate: 2.423E-05 | global batch size: 32 | lm loss: 6.331614E+00 | loss scale: 16384.0 | grad norm: 92393.675 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4211/ 159576 | consumed samples: 87504 | elapsed time per iteration (ms): 14505.6 | learning rate: 2.424E-05 | global batch size: 32 | lm loss: 6.343493E+00 | loss scale: 16384.0 | grad norm: 138840.853 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4212/ 159576 | consumed samples: 87536 | elapsed time per iteration (ms): 14559.8 | learning rate: 2.424E-05 | global batch size: 32 | lm loss: 6.362164E+00 | loss scale: 16384.0 | grad norm: 105314.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4213/ 159576 | consumed samples: 87568 | elapsed time per iteration (ms): 14962.7 | learning rate: 2.425E-05 | global batch size: 32 | lm loss: 6.413978E+00 | loss scale: 16384.0 | grad norm: 100396.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4214/ 159576 | consumed samples: 87600 | elapsed time per iteration (ms): 14459.8 | learning rate: 2.426E-05 | global batch size: 32 | lm loss: 6.333343E+00 | loss scale: 16384.0 | grad norm: 101809.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4215/ 159576 | consumed samples: 87632 | elapsed time per iteration (ms): 14541.9 | learning rate: 2.427E-05 | global batch size: 32 | lm loss: 6.552740E+00 | loss scale: 16384.0 | grad norm: 198031.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4216/ 159576 | consumed samples: 87664 | elapsed time per iteration (ms): 14546.7 | learning rate: 2.428E-05 | global batch size: 32 | lm loss: 6.373903E+00 | loss scale: 16384.0 | grad norm: 98034.031 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4217/ 159576 | consumed samples: 87696 | elapsed time per iteration (ms): 14848.3 | learning rate: 2.429E-05 | global batch size: 32 | lm loss: 6.452424E+00 | loss scale: 16384.0 | grad norm: 267522.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4218/ 159576 | consumed samples: 87728 | elapsed time per iteration (ms): 14570.6 | learning rate: 2.430E-05 | global batch size: 32 | lm loss: 6.493920E+00 | loss scale: 16384.0 | grad norm: 121372.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4219/ 159576 | consumed samples: 87760 | elapsed time per iteration (ms): 14553.1 | learning rate: 2.431E-05 | global batch size: 32 | lm loss: 6.478834E+00 | loss scale: 16384.0 | grad norm: 112151.991 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4220/ 159576 | consumed samples: 87792 | elapsed time per iteration (ms): 14546.6 | learning rate: 2.432E-05 | global batch size: 32 | lm loss: 6.452081E+00 | loss scale: 16384.0 | grad norm: 164176.147 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4221/ 159576 | consumed samples: 87824 | elapsed time per iteration (ms): 14866.7 | learning rate: 2.432E-05 | global batch size: 32 | lm loss: 6.616721E+00 | loss scale: 16384.0 | grad norm: 88412.117 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4222/ 159576 | consumed samples: 87856 | elapsed time per iteration (ms): 14831.9 | learning rate: 2.433E-05 | global batch size: 32 | lm loss: 6.396004E+00 | loss scale: 16384.0 | grad norm: 116548.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4223/ 159576 | consumed samples: 87888 | elapsed time per iteration (ms): 14530.1 | learning rate: 2.434E-05 | global batch size: 32 | lm loss: 6.223457E+00 | loss scale: 16384.0 | grad norm: 151936.770 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4224/ 159576 | consumed samples: 87920 | elapsed time per iteration (ms): 14526.4 | learning rate: 2.435E-05 | global batch size: 32 | lm loss: 6.471479E+00 | loss scale: 16384.0 | grad norm: 107150.884 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4225/ 159576 | consumed samples: 87952 | elapsed time per iteration (ms): 14556.3 | learning rate: 2.436E-05 | global batch size: 32 | lm loss: 6.420123E+00 | loss scale: 16384.0 | grad norm: 118336.101 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4226/ 159576 | consumed samples: 87984 | elapsed time per iteration (ms): 14779.5 | learning rate: 2.437E-05 | global batch size: 32 | lm loss: 6.463729E+00 | loss scale: 16384.0 | grad norm: 105104.920 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4227/ 159576 | consumed samples: 88016 | elapsed time per iteration (ms): 14616.1 | learning rate: 2.438E-05 | global batch size: 32 | lm loss: 6.384348E+00 | loss scale: 16384.0 | grad norm: 121857.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4228/ 159576 | consumed samples: 88048 | elapsed time per iteration (ms): 14595.0 | learning rate: 2.439E-05 | global batch size: 32 | lm loss: 6.562186E+00 | loss scale: 16384.0 | grad norm: 120895.871 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4229/ 159576 | consumed samples: 88080 | elapsed time per iteration (ms): 14592.9 | learning rate: 2.439E-05 | global batch size: 32 | lm loss: 6.614166E+00 | loss scale: 16384.0 | grad norm: 141989.840 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4230/ 159576 | consumed samples: 88112 | elapsed time per iteration (ms): 14745.8 | learning rate: 2.440E-05 | global batch size: 32 | lm loss: 6.416856E+00 | loss scale: 16384.0 | grad norm: 135385.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4231/ 159576 | consumed samples: 88144 | elapsed time per iteration (ms): 14547.3 | learning rate: 2.441E-05 | global batch size: 32 | lm loss: 6.576384E+00 | loss scale: 16384.0 | grad norm: 129034.853 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4232/ 159576 | consumed samples: 88176 | elapsed time per iteration (ms): 14539.9 | learning rate: 2.442E-05 | global batch size: 32 | lm loss: 6.371499E+00 | loss scale: 16384.0 | grad norm: 102463.674 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4233/ 159576 | consumed samples: 88208 | elapsed time per iteration (ms): 14580.8 | learning rate: 2.443E-05 | global batch size: 32 | lm loss: 6.598085E+00 | loss scale: 16384.0 | grad norm: 105075.872 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4234/ 159576 | consumed samples: 88240 | elapsed time per iteration (ms): 14766.2 | learning rate: 2.444E-05 | global batch size: 32 | lm loss: 6.536204E+00 | loss scale: 16384.0 | grad norm: 109004.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4235/ 159576 | consumed samples: 88272 | elapsed time per iteration (ms): 14518.0 | learning rate: 2.445E-05 | global batch size: 32 | lm loss: 6.663161E+00 | loss scale: 16384.0 | grad norm: 197099.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4236/ 159576 | consumed samples: 88304 | elapsed time per iteration (ms): 14598.2 | learning rate: 2.446E-05 | global batch size: 32 | lm loss: 6.451008E+00 | loss scale: 16384.0 | grad norm: 125746.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4237/ 159576 | consumed samples: 88336 | elapsed time per iteration (ms): 14568.7 | learning rate: 2.447E-05 | global batch size: 32 | lm loss: 6.306778E+00 | loss scale: 16384.0 | grad norm: 145717.953 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4238/ 159576 | consumed samples: 88368 | elapsed time per iteration (ms): 14844.4 | learning rate: 2.447E-05 | global batch size: 32 | lm loss: 6.637146E+00 | loss scale: 16384.0 | grad norm: 161986.022 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4239/ 159576 | consumed samples: 88400 | elapsed time per iteration (ms): 14550.6 | learning rate: 2.448E-05 | global batch size: 32 | lm loss: 6.518569E+00 | loss scale: 16384.0 | grad norm: 114815.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4240/ 159576 | consumed samples: 88432 | elapsed time per iteration (ms): 14540.5 | learning rate: 2.449E-05 | global batch size: 32 | lm loss: 6.644086E+00 | loss scale: 16384.0 | grad norm: 127083.954 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4241/ 159576 | consumed samples: 88464 | elapsed time per iteration (ms): 14556.9 | learning rate: 2.450E-05 | global batch size: 32 | lm loss: 6.359149E+00 | loss scale: 16384.0 | grad norm: 119916.985 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4242/ 159576 | consumed samples: 88496 | elapsed time per iteration (ms): 14950.3 | learning rate: 2.451E-05 | global batch size: 32 | lm loss: 6.517668E+00 | loss scale: 16384.0 | grad norm: 116850.173 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4243/ 159576 | consumed samples: 88528 | elapsed time per iteration (ms): 14575.9 | learning rate: 2.452E-05 | global batch size: 32 | lm loss: 6.345152E+00 | loss scale: 16384.0 | grad norm: 106829.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4244/ 159576 | consumed samples: 88560 | elapsed time per iteration (ms): 14588.0 | learning rate: 2.453E-05 | global batch size: 32 | lm loss: 6.476923E+00 | loss scale: 16384.0 | grad norm: 121409.721 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4245/ 159576 | consumed samples: 88592 | elapsed time per iteration (ms): 14539.0 | learning rate: 2.454E-05 | global batch size: 32 | lm loss: 6.428369E+00 | loss scale: 16384.0 | grad norm: 99872.898 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4246/ 159576 | consumed samples: 88624 | elapsed time per iteration (ms): 15044.1 | learning rate: 2.455E-05 | global batch size: 32 | lm loss: 6.447415E+00 | loss scale: 16384.0 | grad norm: 102765.648 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4247/ 159576 | consumed samples: 88656 | elapsed time per iteration (ms): 14546.9 | learning rate: 2.455E-05 | global batch size: 32 | lm loss: 6.336578E+00 | loss scale: 16384.0 | grad norm: 90835.944 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4248/ 159576 | consumed samples: 88688 | elapsed time per iteration (ms): 14540.1 | learning rate: 2.456E-05 | global batch size: 32 | lm loss: 6.555513E+00 | loss scale: 16384.0 | grad norm: 104407.993 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4249/ 159576 | consumed samples: 88720 | elapsed time per iteration (ms): 14613.4 | learning rate: 2.457E-05 | global batch size: 32 | lm loss: 6.546042E+00 | loss scale: 16384.0 | grad norm: 115379.011 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4250/ 159576 | consumed samples: 88752 | elapsed time per iteration (ms): 14829.6 | learning rate: 2.458E-05 | global batch size: 32 | lm loss: 6.436588E+00 | loss scale: 16384.0 | grad norm: 107293.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4251/ 159576 | consumed samples: 88784 | elapsed time per iteration (ms): 14544.9 | learning rate: 2.459E-05 | global batch size: 32 | lm loss: 6.438442E+00 | loss scale: 16384.0 | grad norm: 105034.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4252/ 159576 | consumed samples: 88816 | elapsed time per iteration (ms): 14563.6 | learning rate: 2.460E-05 | global batch size: 32 | lm loss: 6.473608E+00 | loss scale: 16384.0 | grad norm: 84036.769 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4253/ 159576 | consumed samples: 88848 | elapsed time per iteration (ms): 14528.1 | learning rate: 2.461E-05 | global batch size: 32 | lm loss: 6.422614E+00 | loss scale: 16384.0 | grad norm: 95068.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4254/ 159576 | consumed samples: 88880 | elapsed time per iteration (ms): 14918.1 | learning rate: 2.462E-05 | global batch size: 32 | lm loss: 6.295578E+00 | loss scale: 16384.0 | grad norm: 114489.641 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4255/ 159576 | consumed samples: 88912 | elapsed time per iteration (ms): 14525.9 | learning rate: 2.463E-05 | global batch size: 32 | lm loss: 6.416272E+00 | loss scale: 16384.0 | grad norm: 91261.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4256/ 159576 | consumed samples: 88944 | elapsed time per iteration (ms): 14525.5 | learning rate: 2.463E-05 | global batch size: 32 | lm loss: 6.517479E+00 | loss scale: 32768.0 | grad norm: 94254.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4257/ 159576 | consumed samples: 88976 | elapsed time per iteration (ms): 14555.5 | learning rate: 2.464E-05 | global batch size: 32 | lm loss: 6.469455E+00 | loss scale: 32768.0 | grad norm: 174372.981 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4258/ 159576 | consumed samples: 89008 | elapsed time per iteration (ms): 14928.2 | learning rate: 2.465E-05 | global batch size: 32 | lm loss: 6.408867E+00 | loss scale: 32768.0 | grad norm: 205212.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4259/ 159576 | consumed samples: 89040 | elapsed time per iteration (ms): 14529.5 | learning rate: 2.466E-05 | global batch size: 32 | lm loss: 6.518348E+00 | loss scale: 32768.0 | grad norm: 175125.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4260/ 159576 | consumed samples: 89072 | elapsed time per iteration (ms): 14608.9 | learning rate: 2.467E-05 | global batch size: 32 | lm loss: 6.456366E+00 | loss scale: 32768.0 | grad norm: 180925.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4261/ 159576 | consumed samples: 89104 | elapsed time per iteration (ms): 14541.2 | learning rate: 2.468E-05 | global batch size: 32 | lm loss: 6.688640E+00 | loss scale: 32768.0 | grad norm: 205129.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4262/ 159576 | consumed samples: 89136 | elapsed time per iteration (ms): 14984.8 | learning rate: 2.469E-05 | global batch size: 32 | lm loss: 6.381848E+00 | loss scale: 32768.0 | grad norm: 194086.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4263/ 159576 | consumed samples: 89168 | elapsed time per iteration (ms): 14627.4 | learning rate: 2.470E-05 | global batch size: 32 | lm loss: 6.325251E+00 | loss scale: 32768.0 | grad norm: 200329.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4264/ 159576 | consumed samples: 89200 | elapsed time per iteration (ms): 14514.4 | learning rate: 2.471E-05 | global batch size: 32 | lm loss: 6.384187E+00 | loss scale: 32768.0 | grad norm: 206513.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4265/ 159576 | consumed samples: 89232 | elapsed time per iteration (ms): 14532.8 | learning rate: 2.471E-05 | global batch size: 32 | lm loss: 6.524798E+00 | loss scale: 32768.0 | grad norm: 207588.856 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4266/ 159576 | consumed samples: 89264 | elapsed time per iteration (ms): 14499.0 | learning rate: 2.472E-05 | global batch size: 32 | lm loss: 6.427965E+00 | loss scale: 32768.0 | grad norm: 270396.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4267/ 159576 | consumed samples: 89296 | elapsed time per iteration (ms): 14964.3 | learning rate: 2.473E-05 | global batch size: 32 | lm loss: 6.508441E+00 | loss scale: 32768.0 | grad norm: 256825.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4268/ 159576 | consumed samples: 89328 | elapsed time per iteration (ms): 14573.4 | learning rate: 2.474E-05 | global batch size: 32 | lm loss: 6.281446E+00 | loss scale: 32768.0 | grad norm: 175050.841 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4269/ 159576 | consumed samples: 89360 | elapsed time per iteration (ms): 14497.3 | learning rate: 2.475E-05 | global batch size: 32 | lm loss: 6.477619E+00 | loss scale: 32768.0 | grad norm: 194699.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4270/ 159576 | consumed samples: 89392 | elapsed time per iteration (ms): 14560.8 | learning rate: 2.476E-05 | global batch size: 32 | lm loss: 6.521669E+00 | loss scale: 32768.0 | grad norm: 204025.819 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4271/ 159576 | consumed samples: 89424 | elapsed time per iteration (ms): 14634.9 | learning rate: 2.477E-05 | global batch size: 32 | lm loss: 6.532991E+00 | loss scale: 32768.0 | grad norm: 218350.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4272/ 159576 | consumed samples: 89456 | elapsed time per iteration (ms): 14566.6 | learning rate: 2.478E-05 | global batch size: 32 | lm loss: 6.491451E+00 | loss scale: 32768.0 | grad norm: 196213.759 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4273/ 159576 | consumed samples: 89488 | elapsed time per iteration (ms): 14504.5 | learning rate: 2.479E-05 | global batch size: 32 | lm loss: 6.527338E+00 | loss scale: 32768.0 | grad norm: 254430.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4274/ 159576 | consumed samples: 89520 | elapsed time per iteration (ms): 14538.5 | learning rate: 2.479E-05 | global batch size: 32 | lm loss: 6.303001E+00 | loss scale: 32768.0 | grad norm: 189173.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4275/ 159576 | consumed samples: 89552 | elapsed time per iteration (ms): 14691.4 | learning rate: 2.480E-05 | global batch size: 32 | lm loss: 6.465518E+00 | loss scale: 32768.0 | grad norm: 266867.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4276/ 159576 | consumed samples: 89584 | elapsed time per iteration (ms): 14571.4 | learning rate: 2.481E-05 | global batch size: 32 | lm loss: 6.562708E+00 | loss scale: 32768.0 | grad norm: 213181.091 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4277/ 159576 | consumed samples: 89616 | elapsed time per iteration (ms): 14513.3 | learning rate: 2.482E-05 | global batch size: 32 | lm loss: 6.490031E+00 | loss scale: 32768.0 | grad norm: 200238.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4278/ 159576 | consumed samples: 89648 | elapsed time per iteration (ms): 14545.3 | learning rate: 2.483E-05 | global batch size: 32 | lm loss: 6.452188E+00 | loss scale: 32768.0 | grad norm: 209603.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4279/ 159576 | consumed samples: 89680 | elapsed time per iteration (ms): 14892.6 | learning rate: 2.484E-05 | global batch size: 32 | lm loss: 6.402837E+00 | loss scale: 32768.0 | grad norm: 213512.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4280/ 159576 | consumed samples: 89712 | elapsed time per iteration (ms): 14552.6 | learning rate: 2.485E-05 | global batch size: 32 | lm loss: 6.481530E+00 | loss scale: 32768.0 | grad norm: 218939.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4281/ 159576 | consumed samples: 89744 | elapsed time per iteration (ms): 14525.9 | learning rate: 2.486E-05 | global batch size: 32 | lm loss: 6.481557E+00 | loss scale: 32768.0 | grad norm: 211553.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4282/ 159576 | consumed samples: 89776 | elapsed time per iteration (ms): 14536.1 | learning rate: 2.487E-05 | global batch size: 32 | lm loss: 6.396571E+00 | loss scale: 32768.0 | grad norm: 200119.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4283/ 159576 | consumed samples: 89808 | elapsed time per iteration (ms): 14897.4 | learning rate: 2.487E-05 | global batch size: 32 | lm loss: 6.437448E+00 | loss scale: 32768.0 | grad norm: 211733.893 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4284/ 159576 | consumed samples: 89840 | elapsed time per iteration (ms): 14635.9 | learning rate: 2.488E-05 | global batch size: 32 | lm loss: 6.477830E+00 | loss scale: 32768.0 | grad norm: 273937.689 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4285/ 159576 | consumed samples: 89872 | elapsed time per iteration (ms): 14565.4 | learning rate: 2.489E-05 | global batch size: 32 | lm loss: 6.567824E+00 | loss scale: 32768.0 | grad norm: 210402.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4286/ 159576 | consumed samples: 89904 | elapsed time per iteration (ms): 14519.6 | learning rate: 2.490E-05 | global batch size: 32 | lm loss: 6.385768E+00 | loss scale: 32768.0 | grad norm: 203200.040 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4287/ 159576 | consumed samples: 89936 | elapsed time per iteration (ms): 14914.9 | learning rate: 2.491E-05 | global batch size: 32 | lm loss: 6.397992E+00 | loss scale: 32768.0 | grad norm: 182816.610 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4288/ 159576 | consumed samples: 89968 | elapsed time per iteration (ms): 14476.6 | learning rate: 2.492E-05 | global batch size: 32 | lm loss: 6.388610E+00 | loss scale: 32768.0 | grad norm: 199735.518 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4289/ 159576 | consumed samples: 90000 | elapsed time per iteration (ms): 14570.5 | learning rate: 2.493E-05 | global batch size: 32 | lm loss: 6.506209E+00 | loss scale: 32768.0 | grad norm: 206990.921 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4290/ 159576 | consumed samples: 90032 | elapsed time per iteration (ms): 14531.9 | learning rate: 2.494E-05 | global batch size: 32 | lm loss: 6.351604E+00 | loss scale: 32768.0 | grad norm: 204481.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4291/ 159576 | consumed samples: 90064 | elapsed time per iteration (ms): 14860.6 | learning rate: 2.495E-05 | global batch size: 32 | lm loss: 6.518882E+00 | loss scale: 32768.0 | grad norm: 236219.696 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4292/ 159576 | consumed samples: 90096 | elapsed time per iteration (ms): 14581.4 | learning rate: 2.495E-05 | global batch size: 32 | lm loss: 6.428777E+00 | loss scale: 32768.0 | grad norm: 187907.904 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4293/ 159576 | consumed samples: 90128 | elapsed time per iteration (ms): 14508.1 | learning rate: 2.496E-05 | global batch size: 32 | lm loss: 6.327142E+00 | loss scale: 32768.0 | grad norm: 204872.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4294/ 159576 | consumed samples: 90160 | elapsed time per iteration (ms): 14534.7 | learning rate: 2.497E-05 | global batch size: 32 | lm loss: 6.385339E+00 | loss scale: 32768.0 | grad norm: 233375.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4295/ 159576 | consumed samples: 90192 | elapsed time per iteration (ms): 14858.3 | learning rate: 2.498E-05 | global batch size: 32 | lm loss: 6.416627E+00 | loss scale: 32768.0 | grad norm: 222806.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4296/ 159576 | consumed samples: 90224 | elapsed time per iteration (ms): 14474.6 | learning rate: 2.499E-05 | global batch size: 32 | lm loss: 6.518059E+00 | loss scale: 32768.0 | grad norm: 226593.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4297/ 159576 | consumed samples: 90256 | elapsed time per iteration (ms): 14569.0 | learning rate: 2.500E-05 | global batch size: 32 | lm loss: 6.133147E+00 | loss scale: 32768.0 | grad norm: 267419.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4298/ 159576 | consumed samples: 90288 | elapsed time per iteration (ms): 14566.4 | learning rate: 2.501E-05 | global batch size: 32 | lm loss: 6.308548E+00 | loss scale: 32768.0 | grad norm: 204598.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4299/ 159576 | consumed samples: 90320 | elapsed time per iteration (ms): 14984.7 | learning rate: 2.502E-05 | global batch size: 32 | lm loss: 6.369866E+00 | loss scale: 32768.0 | grad norm: 221545.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4300/ 159576 | consumed samples: 90352 | elapsed time per iteration (ms): 14484.6 | learning rate: 2.503E-05 | global batch size: 32 | lm loss: 6.530766E+00 | loss scale: 32768.0 | grad norm: 267800.060 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4301/ 159576 | consumed samples: 90384 | elapsed time per iteration (ms): 14557.5 | learning rate: 2.503E-05 | global batch size: 32 | lm loss: 6.503004E+00 | loss scale: 32768.0 | grad norm: 228461.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4302/ 159576 | consumed samples: 90416 | elapsed time per iteration (ms): 14550.0 | learning rate: 2.504E-05 | global batch size: 32 | lm loss: 6.538440E+00 | loss scale: 32768.0 | grad norm: 190026.980 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4303/ 159576 | consumed samples: 90448 | elapsed time per iteration (ms): 14655.7 | learning rate: 2.505E-05 | global batch size: 32 | lm loss: 6.461242E+00 | loss scale: 32768.0 | grad norm: 211257.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4304/ 159576 | consumed samples: 90480 | elapsed time per iteration (ms): 14769.1 | learning rate: 2.506E-05 | global batch size: 32 | lm loss: 6.479248E+00 | loss scale: 32768.0 | grad norm: 198712.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4305/ 159576 | consumed samples: 90512 | elapsed time per iteration (ms): 14577.3 | learning rate: 2.507E-05 | global batch size: 32 | lm loss: 6.432651E+00 | loss scale: 32768.0 | grad norm: 206822.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4306/ 159576 | consumed samples: 90544 | elapsed time per iteration (ms): 14533.2 | learning rate: 2.508E-05 | global batch size: 32 | lm loss: 6.347961E+00 | loss scale: 32768.0 | grad norm: 195748.989 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4307/ 159576 | consumed samples: 90576 | elapsed time per iteration (ms): 14563.8 | learning rate: 2.509E-05 | global batch size: 32 | lm loss: 6.507642E+00 | loss scale: 32768.0 | grad norm: 218663.158 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4308/ 159576 | consumed samples: 90608 | elapsed time per iteration (ms): 14732.7 | learning rate: 2.510E-05 | global batch size: 32 | lm loss: 6.541059E+00 | loss scale: 32768.0 | grad norm: 228970.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4309/ 159576 | consumed samples: 90640 | elapsed time per iteration (ms): 14469.9 | learning rate: 2.511E-05 | global batch size: 32 | lm loss: 6.424891E+00 | loss scale: 32768.0 | grad norm: 196198.889 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4310/ 159576 | consumed samples: 90672 | elapsed time per iteration (ms): 14508.3 | learning rate: 2.511E-05 | global batch size: 32 | lm loss: 6.490376E+00 | loss scale: 32768.0 | grad norm: 215960.903 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4311/ 159576 | consumed samples: 90704 | elapsed time per iteration (ms): 14508.3 | learning rate: 2.512E-05 | global batch size: 32 | lm loss: 6.488754E+00 | loss scale: 32768.0 | grad norm: 195374.466 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4312/ 159576 | consumed samples: 90736 | elapsed time per iteration (ms): 14753.9 | learning rate: 2.513E-05 | global batch size: 32 | lm loss: 6.448671E+00 | loss scale: 32768.0 | grad norm: 227732.025 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4313/ 159576 | consumed samples: 90768 | elapsed time per iteration (ms): 14571.8 | learning rate: 2.514E-05 | global batch size: 32 | lm loss: 6.500753E+00 | loss scale: 32768.0 | grad norm: 266264.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4314/ 159576 | consumed samples: 90800 | elapsed time per iteration (ms): 14601.7 | learning rate: 2.515E-05 | global batch size: 32 | lm loss: 6.454448E+00 | loss scale: 32768.0 | grad norm: 224312.848 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4315/ 159576 | consumed samples: 90832 | elapsed time per iteration (ms): 14520.9 | learning rate: 2.516E-05 | global batch size: 32 | lm loss: 6.340928E+00 | loss scale: 32768.0 | grad norm: 252168.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4316/ 159576 | consumed samples: 90864 | elapsed time per iteration (ms): 14650.6 | learning rate: 2.517E-05 | global batch size: 32 | lm loss: 6.524774E+00 | loss scale: 32768.0 | grad norm: 233060.511 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4317/ 159576 | consumed samples: 90896 | elapsed time per iteration (ms): 14507.8 | learning rate: 2.518E-05 | global batch size: 32 | lm loss: 6.526123E+00 | loss scale: 32768.0 | grad norm: 228145.157 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4318/ 159576 | consumed samples: 90928 | elapsed time per iteration (ms): 14505.6 | learning rate: 2.518E-05 | global batch size: 32 | lm loss: 6.554380E+00 | loss scale: 32768.0 | grad norm: 215247.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 19:07:09] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 19:07:09] PULSE: tr8-104B is running for 13:14:58 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 4319/ 159576 | consumed samples: 90960 | elapsed time per iteration (ms): 14496.4 | learning rate: 2.519E-05 | global batch size: 32 | lm loss: 6.312326E+00 | loss scale: 32768.0 | grad norm: 214751.055 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4320/ 159576 | consumed samples: 90992 | elapsed time per iteration (ms): 14941.6 | learning rate: 2.520E-05 | global batch size: 32 | lm loss: 6.452510E+00 | loss scale: 32768.0 | grad norm: 260142.714 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4321/ 159576 | consumed samples: 91024 | elapsed time per iteration (ms): 14618.7 | learning rate: 2.521E-05 | global batch size: 32 | lm loss: 6.420647E+00 | loss scale: 32768.0 | grad norm: 225655.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4322/ 159576 | consumed samples: 91056 | elapsed time per iteration (ms): 14566.6 | learning rate: 2.522E-05 | global batch size: 32 | lm loss: 6.402806E+00 | loss scale: 32768.0 | grad norm: 291928.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4323/ 159576 | consumed samples: 91088 | elapsed time per iteration (ms): 14498.7 | learning rate: 2.523E-05 | global batch size: 32 | lm loss: 6.391022E+00 | loss scale: 32768.0 | grad norm: 237551.777 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4324/ 159576 | consumed samples: 91120 | elapsed time per iteration (ms): 15211.7 | learning rate: 2.524E-05 | global batch size: 32 | lm loss: 6.430393E+00 | loss scale: 32768.0 | grad norm: 234733.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4325/ 159576 | consumed samples: 91152 | elapsed time per iteration (ms): 14439.1 | learning rate: 2.525E-05 | global batch size: 32 | lm loss: 6.406878E+00 | loss scale: 32768.0 | grad norm: 212091.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4326/ 159576 | consumed samples: 91184 | elapsed time per iteration (ms): 14533.1 | learning rate: 2.526E-05 | global batch size: 32 | lm loss: 6.439167E+00 | loss scale: 32768.0 | grad norm: 244000.757 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4327/ 159576 | consumed samples: 91216 | elapsed time per iteration (ms): 14508.9 | learning rate: 2.526E-05 | global batch size: 32 | lm loss: 6.334565E+00 | loss scale: 32768.0 | grad norm: 183767.589 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4328/ 159576 | consumed samples: 91248 | elapsed time per iteration (ms): 14921.5 | learning rate: 2.527E-05 | global batch size: 32 | lm loss: 6.456017E+00 | loss scale: 32768.0 | grad norm: 239736.759 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4329/ 159576 | consumed samples: 91280 | elapsed time per iteration (ms): 14572.2 | learning rate: 2.528E-05 | global batch size: 32 | lm loss: 6.367092E+00 | loss scale: 32768.0 | grad norm: 195126.741 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4330/ 159576 | consumed samples: 91312 | elapsed time per iteration (ms): 14531.1 | learning rate: 2.529E-05 | global batch size: 32 | lm loss: 6.383262E+00 | loss scale: 32768.0 | grad norm: 208256.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4331/ 159576 | consumed samples: 91344 | elapsed time per iteration (ms): 14591.9 | learning rate: 2.530E-05 | global batch size: 32 | lm loss: 6.502596E+00 | loss scale: 32768.0 | grad norm: 248824.057 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4332/ 159576 | consumed samples: 91376 | elapsed time per iteration (ms): 14794.2 | learning rate: 2.531E-05 | global batch size: 32 | lm loss: 6.386366E+00 | loss scale: 32768.0 | grad norm: 223413.013 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4333/ 159576 | consumed samples: 91408 | elapsed time per iteration (ms): 14447.8 | learning rate: 2.532E-05 | global batch size: 32 | lm loss: 6.470964E+00 | loss scale: 32768.0 | grad norm: 220869.102 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4334/ 159576 | consumed samples: 91440 | elapsed time per iteration (ms): 14523.5 | learning rate: 2.533E-05 | global batch size: 32 | lm loss: 6.423388E+00 | loss scale: 32768.0 | grad norm: 204896.062 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4335/ 159576 | consumed samples: 91472 | elapsed time per iteration (ms): 14548.8 | learning rate: 2.534E-05 | global batch size: 32 | lm loss: 6.516037E+00 | loss scale: 32768.0 | grad norm: 214455.132 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4336/ 159576 | consumed samples: 91504 | elapsed time per iteration (ms): 14925.7 | learning rate: 2.534E-05 | global batch size: 32 | lm loss: 6.420337E+00 | loss scale: 32768.0 | grad norm: 252272.858 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4337/ 159576 | consumed samples: 91536 | elapsed time per iteration (ms): 14576.6 | learning rate: 2.535E-05 | global batch size: 32 | lm loss: 6.464952E+00 | loss scale: 32768.0 | grad norm: 193893.530 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4338/ 159576 | consumed samples: 91568 | elapsed time per iteration (ms): 14502.1 | learning rate: 2.536E-05 | global batch size: 32 | lm loss: 6.492158E+00 | loss scale: 32768.0 | grad norm: 243709.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4339/ 159576 | consumed samples: 91600 | elapsed time per iteration (ms): 14503.5 | learning rate: 2.537E-05 | global batch size: 32 | lm loss: 6.239275E+00 | loss scale: 32768.0 | grad norm: 206242.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4340/ 159576 | consumed samples: 91632 | elapsed time per iteration (ms): 14881.4 | learning rate: 2.538E-05 | global batch size: 32 | lm loss: 6.484446E+00 | loss scale: 32768.0 | grad norm: 213552.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4341/ 159576 | consumed samples: 91664 | elapsed time per iteration (ms): 14651.1 | learning rate: 2.539E-05 | global batch size: 32 | lm loss: 6.419237E+00 | loss scale: 32768.0 | grad norm: 210520.111 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4342/ 159576 | consumed samples: 91696 | elapsed time per iteration (ms): 14512.3 | learning rate: 2.540E-05 | global batch size: 32 | lm loss: 6.452721E+00 | loss scale: 32768.0 | grad norm: 238634.139 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4343/ 159576 | consumed samples: 91728 | elapsed time per iteration (ms): 14558.7 | learning rate: 2.541E-05 | global batch size: 32 | lm loss: 6.347074E+00 | loss scale: 32768.0 | grad norm: 202447.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4344/ 159576 | consumed samples: 91760 | elapsed time per iteration (ms): 14594.4 | learning rate: 2.542E-05 | global batch size: 32 | lm loss: 6.520543E+00 | loss scale: 32768.0 | grad norm: 239073.554 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4345/ 159576 | consumed samples: 91792 | elapsed time per iteration (ms): 14908.5 | learning rate: 2.542E-05 | global batch size: 32 | lm loss: 6.421722E+00 | loss scale: 32768.0 | grad norm: 217284.913 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4346/ 159576 | consumed samples: 91824 | elapsed time per iteration (ms): 14533.0 | learning rate: 2.543E-05 | global batch size: 32 | lm loss: 6.272108E+00 | loss scale: 32768.0 | grad norm: 200271.872 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4347/ 159576 | consumed samples: 91856 | elapsed time per iteration (ms): 14569.7 | learning rate: 2.544E-05 | global batch size: 32 | lm loss: 6.532617E+00 | loss scale: 32768.0 | grad norm: 194761.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4348/ 159576 | consumed samples: 91888 | elapsed time per iteration (ms): 14475.9 | learning rate: 2.545E-05 | global batch size: 32 | lm loss: 6.471928E+00 | loss scale: 32768.0 | grad norm: 217213.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4349/ 159576 | consumed samples: 91920 | elapsed time per iteration (ms): 14760.6 | learning rate: 2.546E-05 | global batch size: 32 | lm loss: 6.416161E+00 | loss scale: 32768.0 | grad norm: 224313.842 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4350/ 159576 | consumed samples: 91952 | elapsed time per iteration (ms): 14554.3 | learning rate: 2.547E-05 | global batch size: 32 | lm loss: 6.550965E+00 | loss scale: 32768.0 | grad norm: 241887.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4351/ 159576 | consumed samples: 91984 | elapsed time per iteration (ms): 14563.9 | learning rate: 2.548E-05 | global batch size: 32 | lm loss: 6.496109E+00 | loss scale: 32768.0 | grad norm: 216683.843 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4352/ 159576 | consumed samples: 92016 | elapsed time per iteration (ms): 14514.3 | learning rate: 2.549E-05 | global batch size: 32 | lm loss: 6.359037E+00 | loss scale: 32768.0 | grad norm: 205500.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4353/ 159576 | consumed samples: 92048 | elapsed time per iteration (ms): 14703.1 | learning rate: 2.550E-05 | global batch size: 32 | lm loss: 6.333501E+00 | loss scale: 32768.0 | grad norm: 326501.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4354/ 159576 | consumed samples: 92080 | elapsed time per iteration (ms): 14558.2 | learning rate: 2.550E-05 | global batch size: 32 | lm loss: 6.455669E+00 | loss scale: 32768.0 | grad norm: 254904.658 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4355/ 159576 | consumed samples: 92112 | elapsed time per iteration (ms): 14511.5 | learning rate: 2.551E-05 | global batch size: 32 | lm loss: 6.509322E+00 | loss scale: 32768.0 | grad norm: 237041.501 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4356/ 159576 | consumed samples: 92144 | elapsed time per iteration (ms): 14539.0 | learning rate: 2.552E-05 | global batch size: 32 | lm loss: 6.356802E+00 | loss scale: 32768.0 | grad norm: 268871.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4357/ 159576 | consumed samples: 92176 | elapsed time per iteration (ms): 14822.4 | learning rate: 2.553E-05 | global batch size: 32 | lm loss: 6.599571E+00 | loss scale: 32768.0 | grad norm: 283473.183 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4358/ 159576 | consumed samples: 92208 | elapsed time per iteration (ms): 14612.7 | learning rate: 2.554E-05 | global batch size: 32 | lm loss: 6.308304E+00 | loss scale: 32768.0 | grad norm: 231784.921 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4359/ 159576 | consumed samples: 92240 | elapsed time per iteration (ms): 14524.9 | learning rate: 2.555E-05 | global batch size: 32 | lm loss: 6.395612E+00 | loss scale: 32768.0 | grad norm: 270045.717 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4360/ 159576 | consumed samples: 92272 | elapsed time per iteration (ms): 14601.7 | learning rate: 2.556E-05 | global batch size: 32 | lm loss: 6.525626E+00 | loss scale: 32768.0 | grad norm: 275256.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4361/ 159576 | consumed samples: 92304 | elapsed time per iteration (ms): 14951.2 | learning rate: 2.557E-05 | global batch size: 32 | lm loss: 6.457727E+00 | loss scale: 32768.0 | grad norm: 277346.905 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4362/ 159576 | consumed samples: 92336 | elapsed time per iteration (ms): 14507.2 | learning rate: 2.558E-05 | global batch size: 32 | lm loss: 6.423290E+00 | loss scale: 32768.0 | grad norm: 259149.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4363/ 159576 | consumed samples: 92368 | elapsed time per iteration (ms): 14519.9 | learning rate: 2.558E-05 | global batch size: 32 | lm loss: 6.385529E+00 | loss scale: 32768.0 | grad norm: 288729.160 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4364/ 159576 | consumed samples: 92400 | elapsed time per iteration (ms): 14590.0 | learning rate: 2.559E-05 | global batch size: 32 | lm loss: 6.344237E+00 | loss scale: 32768.0 | grad norm: 224867.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4365/ 159576 | consumed samples: 92432 | elapsed time per iteration (ms): 15022.1 | learning rate: 2.560E-05 | global batch size: 32 | lm loss: 6.361878E+00 | loss scale: 32768.0 | grad norm: 317761.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4366/ 159576 | consumed samples: 92464 | elapsed time per iteration (ms): 14751.4 | learning rate: 2.561E-05 | global batch size: 32 | lm loss: 6.330537E+00 | loss scale: 32768.0 | grad norm: 265015.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4367/ 159576 | consumed samples: 92496 | elapsed time per iteration (ms): 14614.0 | learning rate: 2.562E-05 | global batch size: 32 | lm loss: 6.148376E+00 | loss scale: 32768.0 | grad norm: 264202.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4368/ 159576 | consumed samples: 92528 | elapsed time per iteration (ms): 14584.5 | learning rate: 2.563E-05 | global batch size: 32 | lm loss: 6.479382E+00 | loss scale: 32768.0 | grad norm: 264375.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4369/ 159576 | consumed samples: 92560 | elapsed time per iteration (ms): 14918.5 | learning rate: 2.564E-05 | global batch size: 32 | lm loss: 6.363014E+00 | loss scale: 32768.0 | grad norm: 226102.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4370/ 159576 | consumed samples: 92592 | elapsed time per iteration (ms): 14489.4 | learning rate: 2.565E-05 | global batch size: 32 | lm loss: 6.437625E+00 | loss scale: 32768.0 | grad norm: 280139.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4371/ 159576 | consumed samples: 92624 | elapsed time per iteration (ms): 14515.3 | learning rate: 2.566E-05 | global batch size: 32 | lm loss: 6.394330E+00 | loss scale: 32768.0 | grad norm: 290041.946 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4372/ 159576 | consumed samples: 92656 | elapsed time per iteration (ms): 14519.6 | learning rate: 2.566E-05 | global batch size: 32 | lm loss: 6.430163E+00 | loss scale: 32768.0 | grad norm: 318528.997 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4373/ 159576 | consumed samples: 92688 | elapsed time per iteration (ms): 14816.9 | learning rate: 2.567E-05 | global batch size: 32 | lm loss: 6.494810E+00 | loss scale: 32768.0 | grad norm: 279939.060 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4374/ 159576 | consumed samples: 92720 | elapsed time per iteration (ms): 14615.4 | learning rate: 2.568E-05 | global batch size: 32 | lm loss: 6.431265E+00 | loss scale: 32768.0 | grad norm: 260943.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4375/ 159576 | consumed samples: 92752 | elapsed time per iteration (ms): 14539.2 | learning rate: 2.569E-05 | global batch size: 32 | lm loss: 6.365846E+00 | loss scale: 32768.0 | grad norm: 614516.527 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4376/ 159576 | consumed samples: 92784 | elapsed time per iteration (ms): 14560.9 | learning rate: 2.570E-05 | global batch size: 32 | lm loss: 6.306572E+00 | loss scale: 32768.0 | grad norm: 303539.975 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4377/ 159576 | consumed samples: 92816 | elapsed time per iteration (ms): 14894.6 | learning rate: 2.571E-05 | global batch size: 32 | lm loss: 6.444806E+00 | loss scale: 32768.0 | grad norm: 305405.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4378/ 159576 | consumed samples: 92848 | elapsed time per iteration (ms): 14498.0 | learning rate: 2.572E-05 | global batch size: 32 | lm loss: 6.475850E+00 | loss scale: 32768.0 | grad norm: 302245.775 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4379/ 159576 | consumed samples: 92880 | elapsed time per iteration (ms): 14519.5 | learning rate: 2.573E-05 | global batch size: 32 | lm loss: 6.470803E+00 | loss scale: 32768.0 | grad norm: 302163.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4380/ 159576 | consumed samples: 92912 | elapsed time per iteration (ms): 14547.1 | learning rate: 2.574E-05 | global batch size: 32 | lm loss: 6.285831E+00 | loss scale: 32768.0 | grad norm: 245533.159 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4381/ 159576 | consumed samples: 92944 | elapsed time per iteration (ms): 14903.6 | learning rate: 2.574E-05 | global batch size: 32 | lm loss: 6.382543E+00 | loss scale: 32768.0 | grad norm: 256847.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4382/ 159576 | consumed samples: 92976 | elapsed time per iteration (ms): 14746.3 | learning rate: 2.575E-05 | global batch size: 32 | lm loss: 6.377112E+00 | loss scale: 32768.0 | grad norm: 234822.067 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4383/ 159576 | consumed samples: 93008 | elapsed time per iteration (ms): 14580.0 | learning rate: 2.576E-05 | global batch size: 32 | lm loss: 6.412641E+00 | loss scale: 32768.0 | grad norm: 343040.768 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4384/ 159576 | consumed samples: 93040 | elapsed time per iteration (ms): 14506.7 | learning rate: 2.577E-05 | global batch size: 32 | lm loss: 6.416348E+00 | loss scale: 32768.0 | grad norm: 291818.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4385/ 159576 | consumed samples: 93072 | elapsed time per iteration (ms): 14512.2 | learning rate: 2.578E-05 | global batch size: 32 | lm loss: 6.425752E+00 | loss scale: 32768.0 | grad norm: 323662.796 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4386/ 159576 | consumed samples: 93104 | elapsed time per iteration (ms): 14928.6 | learning rate: 2.579E-05 | global batch size: 32 | lm loss: 6.318911E+00 | loss scale: 32768.0 | grad norm: 305616.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4387/ 159576 | consumed samples: 93136 | elapsed time per iteration (ms): 14506.3 | learning rate: 2.580E-05 | global batch size: 32 | lm loss: 6.531947E+00 | loss scale: 32768.0 | grad norm: 350201.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4388/ 159576 | consumed samples: 93168 | elapsed time per iteration (ms): 14556.8 | learning rate: 2.581E-05 | global batch size: 32 | lm loss: 6.376329E+00 | loss scale: 32768.0 | grad norm: 345044.523 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4389/ 159576 | consumed samples: 93200 | elapsed time per iteration (ms): 14537.0 | learning rate: 2.582E-05 | global batch size: 32 | lm loss: 6.381351E+00 | loss scale: 32768.0 | grad norm: 285108.825 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4390/ 159576 | consumed samples: 93232 | elapsed time per iteration (ms): 14792.9 | learning rate: 2.582E-05 | global batch size: 32 | lm loss: 6.367733E+00 | loss scale: 32768.0 | grad norm: 443607.853 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4391/ 159576 | consumed samples: 93264 | elapsed time per iteration (ms): 14536.7 | learning rate: 2.583E-05 | global batch size: 32 | lm loss: 6.404822E+00 | loss scale: 32768.0 | grad norm: 266018.610 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4392/ 159576 | consumed samples: 93296 | elapsed time per iteration (ms): 14465.3 | learning rate: 2.584E-05 | global batch size: 32 | lm loss: 6.460493E+00 | loss scale: 32768.0 | grad norm: 388305.684 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4393/ 159576 | consumed samples: 93328 | elapsed time per iteration (ms): 14549.7 | learning rate: 2.585E-05 | global batch size: 32 | lm loss: 6.312160E+00 | loss scale: 32768.0 | grad norm: 289444.907 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4394/ 159576 | consumed samples: 93360 | elapsed time per iteration (ms): 14712.4 | learning rate: 2.586E-05 | global batch size: 32 | lm loss: 6.447091E+00 | loss scale: 32768.0 | grad norm: 310866.794 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4395/ 159576 | consumed samples: 93392 | elapsed time per iteration (ms): 14507.9 | learning rate: 2.587E-05 | global batch size: 32 | lm loss: 6.358830E+00 | loss scale: 32768.0 | grad norm: 254147.069 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4396/ 159576 | consumed samples: 93424 | elapsed time per iteration (ms): 14549.6 | learning rate: 2.588E-05 | global batch size: 32 | lm loss: 6.406147E+00 | loss scale: 32768.0 | grad norm: 368220.982 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4397/ 159576 | consumed samples: 93456 | elapsed time per iteration (ms): 14535.1 | learning rate: 2.589E-05 | global batch size: 32 | lm loss: 6.511951E+00 | loss scale: 32768.0 | grad norm: 306021.916 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4398/ 159576 | consumed samples: 93488 | elapsed time per iteration (ms): 14834.9 | learning rate: 2.589E-05 | global batch size: 32 | lm loss: 6.344939E+00 | loss scale: 32768.0 | grad norm: 244440.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4399/ 159576 | consumed samples: 93520 | elapsed time per iteration (ms): 14561.9 | learning rate: 2.590E-05 | global batch size: 32 | lm loss: 6.408576E+00 | loss scale: 32768.0 | grad norm: 331789.025 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4400/ 159576 | consumed samples: 93552 | elapsed time per iteration (ms): 14527.0 | learning rate: 2.591E-05 | global batch size: 32 | lm loss: 6.405599E+00 | loss scale: 32768.0 | grad norm: 389927.053 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4401/ 159576 | consumed samples: 93584 | elapsed time per iteration (ms): 14530.9 | learning rate: 2.592E-05 | global batch size: 32 | lm loss: 6.461980E+00 | loss scale: 32768.0 | grad norm: 344518.886 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4402/ 159576 | consumed samples: 93616 | elapsed time per iteration (ms): 15042.1 | learning rate: 2.593E-05 | global batch size: 32 | lm loss: 6.416601E+00 | loss scale: 32768.0 | grad norm: 310590.140 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4403/ 159576 | consumed samples: 93648 | elapsed time per iteration (ms): 14634.8 | learning rate: 2.594E-05 | global batch size: 32 | lm loss: 6.546180E+00 | loss scale: 32768.0 | grad norm: 267385.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4404/ 159576 | consumed samples: 93680 | elapsed time per iteration (ms): 14549.2 | learning rate: 2.595E-05 | global batch size: 32 | lm loss: 6.399436E+00 | loss scale: 32768.0 | grad norm: 298662.044 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4405/ 159576 | consumed samples: 93712 | elapsed time per iteration (ms): 14489.5 | learning rate: 2.596E-05 | global batch size: 32 | lm loss: 6.306044E+00 | loss scale: 32768.0 | grad norm: 302499.736 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4406/ 159576 | consumed samples: 93744 | elapsed time per iteration (ms): 14963.1 | learning rate: 2.597E-05 | global batch size: 32 | lm loss: 6.504598E+00 | loss scale: 32768.0 | grad norm: 315577.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4407/ 159576 | consumed samples: 93776 | elapsed time per iteration (ms): 14516.0 | learning rate: 2.597E-05 | global batch size: 32 | lm loss: 6.229925E+00 | loss scale: 32768.0 | grad norm: 238182.668 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4408/ 159576 | consumed samples: 93808 | elapsed time per iteration (ms): 14496.6 | learning rate: 2.598E-05 | global batch size: 32 | lm loss: 6.414362E+00 | loss scale: 32768.0 | grad norm: 274509.689 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4409/ 159576 | consumed samples: 93840 | elapsed time per iteration (ms): 14543.5 | learning rate: 2.599E-05 | global batch size: 32 | lm loss: 6.355350E+00 | loss scale: 32768.0 | grad norm: 288329.828 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4410/ 159576 | consumed samples: 93872 | elapsed time per iteration (ms): 14875.5 | learning rate: 2.600E-05 | global batch size: 32 | lm loss: 6.366935E+00 | loss scale: 32768.0 | grad norm: 252983.122 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4411/ 159576 | consumed samples: 93904 | elapsed time per iteration (ms): 14456.2 | learning rate: 2.601E-05 | global batch size: 32 | lm loss: 6.458515E+00 | loss scale: 32768.0 | grad norm: 210575.780 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4412/ 159576 | consumed samples: 93936 | elapsed time per iteration (ms): 14560.7 | learning rate: 2.602E-05 | global batch size: 32 | lm loss: 6.472146E+00 | loss scale: 32768.0 | grad norm: 237114.094 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4413/ 159576 | consumed samples: 93968 | elapsed time per iteration (ms): 14587.5 | learning rate: 2.603E-05 | global batch size: 32 | lm loss: 6.359771E+00 | loss scale: 32768.0 | grad norm: 252911.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4414/ 159576 | consumed samples: 94000 | elapsed time per iteration (ms): 14804.6 | learning rate: 2.604E-05 | global batch size: 32 | lm loss: 6.563889E+00 | loss scale: 32768.0 | grad norm: 296794.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4415/ 159576 | consumed samples: 94032 | elapsed time per iteration (ms): 14512.9 | learning rate: 2.605E-05 | global batch size: 32 | lm loss: 6.413787E+00 | loss scale: 32768.0 | grad norm: 272034.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4416/ 159576 | consumed samples: 94064 | elapsed time per iteration (ms): 14494.5 | learning rate: 2.605E-05 | global batch size: 32 | lm loss: 6.443899E+00 | loss scale: 32768.0 | grad norm: 290284.950 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4417/ 159576 | consumed samples: 94096 | elapsed time per iteration (ms): 14536.8 | learning rate: 2.606E-05 | global batch size: 32 | lm loss: 6.472334E+00 | loss scale: 32768.0 | grad norm: 248961.089 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4418/ 159576 | consumed samples: 94128 | elapsed time per iteration (ms): 14975.6 | learning rate: 2.607E-05 | global batch size: 32 | lm loss: 6.557878E+00 | loss scale: 32768.0 | grad norm: 330814.857 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4419/ 159576 | consumed samples: 94160 | elapsed time per iteration (ms): 14477.8 | learning rate: 2.608E-05 | global batch size: 32 | lm loss: 6.499488E+00 | loss scale: 32768.0 | grad norm: 268804.004 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4420/ 159576 | consumed samples: 94192 | elapsed time per iteration (ms): 14628.8 | learning rate: 2.609E-05 | global batch size: 32 | lm loss: 6.312944E+00 | loss scale: 32768.0 | grad norm: 264253.854 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4421/ 159576 | consumed samples: 94224 | elapsed time per iteration (ms): 14519.9 | learning rate: 2.610E-05 | global batch size: 32 | lm loss: 6.392362E+00 | loss scale: 32768.0 | grad norm: 255470.733 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4422/ 159576 | consumed samples: 94256 | elapsed time per iteration (ms): 14805.5 | learning rate: 2.611E-05 | global batch size: 32 | lm loss: 6.375703E+00 | loss scale: 32768.0 | grad norm: 246267.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4423/ 159576 | consumed samples: 94288 | elapsed time per iteration (ms): 14680.3 | learning rate: 2.612E-05 | global batch size: 32 | lm loss: 6.523773E+00 | loss scale: 32768.0 | grad norm: 281090.751 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4424/ 159576 | consumed samples: 94320 | elapsed time per iteration (ms): 7706.4 | learning rate: 2.612E-05 | global batch size: 32 | lm loss: 6.355268E+00 | loss scale: 32768.0 | grad norm: 281090.751 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4425/ 159576 | consumed samples: 94352 | elapsed time per iteration (ms): 13992.5 | learning rate: 2.613E-05 | global batch size: 32 | lm loss: 6.391113E+00 | loss scale: 32768.0 | grad norm: 235806.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4426/ 159576 | consumed samples: 94384 | elapsed time per iteration (ms): 14643.4 | learning rate: 2.613E-05 | global batch size: 32 | lm loss: 6.483145E+00 | loss scale: 32768.0 | grad norm: 316001.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4427/ 159576 | consumed samples: 94416 | elapsed time per iteration (ms): 14931.0 | learning rate: 2.614E-05 | global batch size: 32 | lm loss: 6.419625E+00 | loss scale: 32768.0 | grad norm: 595148.752 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4428/ 159576 | consumed samples: 94448 | elapsed time per iteration (ms): 14542.3 | learning rate: 2.615E-05 | global batch size: 32 | lm loss: 6.463273E+00 | loss scale: 32768.0 | grad norm: 310708.077 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4429/ 159576 | consumed samples: 94480 | elapsed time per iteration (ms): 14522.5 | learning rate: 2.616E-05 | global batch size: 32 | lm loss: 6.427548E+00 | loss scale: 32768.0 | grad norm: 324018.149 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4430/ 159576 | consumed samples: 94512 | elapsed time per iteration (ms): 14489.9 | learning rate: 2.617E-05 | global batch size: 32 | lm loss: 6.385033E+00 | loss scale: 32768.0 | grad norm: 244981.121 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4431/ 159576 | consumed samples: 94560 | elapsed time per iteration (ms): 15763.7 | learning rate: 2.618E-05 | global batch size: 48 | lm loss: 6.545300E+00 | loss scale: 32768.0 | grad norm: 209680.886 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4432/ 159576 | consumed samples: 94608 | elapsed time per iteration (ms): 15487.4 | learning rate: 2.620E-05 | global batch size: 48 | lm loss: 6.439948E+00 | loss scale: 32768.0 | grad norm: 242738.510 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4433/ 159576 | consumed samples: 94656 | elapsed time per iteration (ms): 15516.6 | learning rate: 2.621E-05 | global batch size: 48 | lm loss: 6.392755E+00 | loss scale: 32768.0 | grad norm: 221617.752 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4434/ 159576 | consumed samples: 94704 | elapsed time per iteration (ms): 15531.5 | learning rate: 2.622E-05 | global batch size: 48 | lm loss: 6.430658E+00 | loss scale: 32768.0 | grad norm: 237786.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4435/ 159576 | consumed samples: 94752 | elapsed time per iteration (ms): 15905.6 | learning rate: 2.624E-05 | global batch size: 48 | lm loss: 6.556681E+00 | loss scale: 32768.0 | grad norm: 268817.064 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4436/ 159576 | consumed samples: 94800 | elapsed time per iteration (ms): 15557.4 | learning rate: 2.625E-05 | global batch size: 48 | lm loss: 6.284402E+00 | loss scale: 32768.0 | grad norm: 217583.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4437/ 159576 | consumed samples: 94848 | elapsed time per iteration (ms): 15418.7 | learning rate: 2.626E-05 | global batch size: 48 | lm loss: 6.449813E+00 | loss scale: 32768.0 | grad norm: 250831.113 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4438/ 159576 | consumed samples: 94896 | elapsed time per iteration (ms): 15465.2 | learning rate: 2.628E-05 | global batch size: 48 | lm loss: 6.524204E+00 | loss scale: 32768.0 | grad norm: 237741.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4439/ 159576 | consumed samples: 94944 | elapsed time per iteration (ms): 15664.4 | learning rate: 2.629E-05 | global batch size: 48 | lm loss: 6.426958E+00 | loss scale: 32768.0 | grad norm: 275670.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4440/ 159576 | consumed samples: 94992 | elapsed time per iteration (ms): 15485.6 | learning rate: 2.630E-05 | global batch size: 48 | lm loss: 6.312765E+00 | loss scale: 32768.0 | grad norm: 236643.110 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4441/ 159576 | consumed samples: 95040 | elapsed time per iteration (ms): 15554.2 | learning rate: 2.632E-05 | global batch size: 48 | lm loss: 6.353696E+00 | loss scale: 32768.0 | grad norm: 244108.176 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4442/ 159576 | consumed samples: 95088 | elapsed time per iteration (ms): 15559.7 | learning rate: 2.633E-05 | global batch size: 48 | lm loss: 6.390371E+00 | loss scale: 32768.0 | grad norm: 415315.134 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4443/ 159576 | consumed samples: 95136 | elapsed time per iteration (ms): 15762.5 | learning rate: 2.634E-05 | global batch size: 48 | lm loss: 6.406565E+00 | loss scale: 32768.0 | grad norm: 379916.616 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4444/ 159576 | consumed samples: 95184 | elapsed time per iteration (ms): 15453.3 | learning rate: 2.636E-05 | global batch size: 48 | lm loss: 6.429417E+00 | loss scale: 32768.0 | grad norm: 221219.524 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4445/ 159576 | consumed samples: 95232 | elapsed time per iteration (ms): 15417.8 | learning rate: 2.637E-05 | global batch size: 48 | lm loss: 6.443903E+00 | loss scale: 32768.0 | grad norm: 296633.656 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4446/ 159576 | consumed samples: 95280 | elapsed time per iteration (ms): 15443.7 | learning rate: 2.638E-05 | global batch size: 48 | lm loss: 6.532698E+00 | loss scale: 32768.0 | grad norm: 269367.053 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4447/ 159576 | consumed samples: 95328 | elapsed time per iteration (ms): 15690.5 | learning rate: 2.640E-05 | global batch size: 48 | lm loss: 6.390007E+00 | loss scale: 32768.0 | grad norm: 235234.160 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4448/ 159576 | consumed samples: 95376 | elapsed time per iteration (ms): 15488.0 | learning rate: 2.641E-05 | global batch size: 48 | lm loss: 6.393896E+00 | loss scale: 32768.0 | grad norm: 210963.912 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4449/ 159576 | consumed samples: 95424 | elapsed time per iteration (ms): 15546.6 | learning rate: 2.642E-05 | global batch size: 48 | lm loss: 6.387472E+00 | loss scale: 32768.0 | grad norm: 214989.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4450/ 159576 | consumed samples: 95472 | elapsed time per iteration (ms): 15940.5 | learning rate: 2.644E-05 | global batch size: 48 | lm loss: 6.395288E+00 | loss scale: 32768.0 | grad norm: 214649.184 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4451/ 159576 | consumed samples: 95520 | elapsed time per iteration (ms): 15450.6 | learning rate: 2.645E-05 | global batch size: 48 | lm loss: 6.391924E+00 | loss scale: 32768.0 | grad norm: 256872.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4452/ 159576 | consumed samples: 95568 | elapsed time per iteration (ms): 15411.8 | learning rate: 2.646E-05 | global batch size: 48 | lm loss: 6.372116E+00 | loss scale: 32768.0 | grad norm: 227618.006 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4453/ 159576 | consumed samples: 95616 | elapsed time per iteration (ms): 15430.5 | learning rate: 2.648E-05 | global batch size: 48 | lm loss: 6.411846E+00 | loss scale: 32768.0 | grad norm: 239941.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4454/ 159576 | consumed samples: 95664 | elapsed time per iteration (ms): 15763.6 | learning rate: 2.649E-05 | global batch size: 48 | lm loss: 6.412562E+00 | loss scale: 32768.0 | grad norm: 229907.704 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4455/ 159576 | consumed samples: 95712 | elapsed time per iteration (ms): 15524.7 | learning rate: 2.650E-05 | global batch size: 48 | lm loss: 6.428136E+00 | loss scale: 32768.0 | grad norm: 223866.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4456/ 159576 | consumed samples: 95760 | elapsed time per iteration (ms): 15490.3 | learning rate: 2.652E-05 | global batch size: 48 | lm loss: 6.476852E+00 | loss scale: 32768.0 | grad norm: 263813.676 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4457/ 159576 | consumed samples: 95808 | elapsed time per iteration (ms): 15514.4 | learning rate: 2.653E-05 | global batch size: 48 | lm loss: 6.382901E+00 | loss scale: 32768.0 | grad norm: 257590.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4458/ 159576 | consumed samples: 95856 | elapsed time per iteration (ms): 15907.9 | learning rate: 2.654E-05 | global batch size: 48 | lm loss: 6.444118E+00 | loss scale: 32768.0 | grad norm: 236507.018 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4459/ 159576 | consumed samples: 95904 | elapsed time per iteration (ms): 15454.4 | learning rate: 2.656E-05 | global batch size: 48 | lm loss: 6.392717E+00 | loss scale: 32768.0 | grad norm: 227300.988 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4460/ 159576 | consumed samples: 95952 | elapsed time per iteration (ms): 15435.7 | learning rate: 2.657E-05 | global batch size: 48 | lm loss: 6.375526E+00 | loss scale: 32768.0 | grad norm: 217329.765 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4461/ 159576 | consumed samples: 96000 | elapsed time per iteration (ms): 15463.0 | learning rate: 2.658E-05 | global batch size: 48 | lm loss: 6.442908E+00 | loss scale: 32768.0 | grad norm: 210214.078 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4462/ 159576 | consumed samples: 96048 | elapsed time per iteration (ms): 15890.8 | learning rate: 2.660E-05 | global batch size: 48 | lm loss: 6.347652E+00 | loss scale: 32768.0 | grad norm: 241592.870 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4463/ 159576 | consumed samples: 96096 | elapsed time per iteration (ms): 15523.3 | learning rate: 2.661E-05 | global batch size: 48 | lm loss: 6.408596E+00 | loss scale: 32768.0 | grad norm: 286741.620 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4464/ 159576 | consumed samples: 96144 | elapsed time per iteration (ms): 15484.1 | learning rate: 2.662E-05 | global batch size: 48 | lm loss: 6.423483E+00 | loss scale: 32768.0 | grad norm: 227347.115 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4465/ 159576 | consumed samples: 96192 | elapsed time per iteration (ms): 15505.4 | learning rate: 2.664E-05 | global batch size: 48 | lm loss: 6.465323E+00 | loss scale: 32768.0 | grad norm: 278891.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4466/ 159576 | consumed samples: 96240 | elapsed time per iteration (ms): 15734.3 | learning rate: 2.665E-05 | global batch size: 48 | lm loss: 6.540909E+00 | loss scale: 32768.0 | grad norm: 271330.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4467/ 159576 | consumed samples: 96288 | elapsed time per iteration (ms): 15463.2 | learning rate: 2.666E-05 | global batch size: 48 | lm loss: 6.366038E+00 | loss scale: 32768.0 | grad norm: 230305.551 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4468/ 159576 | consumed samples: 96336 | elapsed time per iteration (ms): 15456.1 | learning rate: 2.668E-05 | global batch size: 48 | lm loss: 6.383101E+00 | loss scale: 32768.0 | grad norm: 266194.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4469/ 159576 | consumed samples: 96384 | elapsed time per iteration (ms): 15450.4 | learning rate: 2.669E-05 | global batch size: 48 | lm loss: 6.383107E+00 | loss scale: 32768.0 | grad norm: 224990.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4470/ 159576 | consumed samples: 96432 | elapsed time per iteration (ms): 15624.0 | learning rate: 2.670E-05 | global batch size: 48 | lm loss: 6.393697E+00 | loss scale: 32768.0 | grad norm: 301446.071 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4471/ 159576 | consumed samples: 96480 | elapsed time per iteration (ms): 15530.2 | learning rate: 2.672E-05 | global batch size: 48 | lm loss: 6.364079E+00 | loss scale: 32768.0 | grad norm: 215922.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4472/ 159576 | consumed samples: 96528 | elapsed time per iteration (ms): 15512.2 | learning rate: 2.673E-05 | global batch size: 48 | lm loss: 6.373242E+00 | loss scale: 32768.0 | grad norm: 297810.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4473/ 159576 | consumed samples: 96576 | elapsed time per iteration (ms): 15493.5 | learning rate: 2.674E-05 | global batch size: 48 | lm loss: 6.458824E+00 | loss scale: 32768.0 | grad norm: 253875.814 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4474/ 159576 | consumed samples: 96624 | elapsed time per iteration (ms): 16109.8 | learning rate: 2.676E-05 | global batch size: 48 | lm loss: 6.444027E+00 | loss scale: 32768.0 | grad norm: 235767.912 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4475/ 159576 | consumed samples: 96672 | elapsed time per iteration (ms): 15442.4 | learning rate: 2.677E-05 | global batch size: 48 | lm loss: 6.379702E+00 | loss scale: 32768.0 | grad norm: 200816.895 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4476/ 159576 | consumed samples: 96720 | elapsed time per iteration (ms): 15439.1 | learning rate: 2.678E-05 | global batch size: 48 | lm loss: 6.460698E+00 | loss scale: 32768.0 | grad norm: 243887.532 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4477/ 159576 | consumed samples: 96768 | elapsed time per iteration (ms): 15842.8 | learning rate: 2.680E-05 | global batch size: 48 | lm loss: 6.425824E+00 | loss scale: 32768.0 | grad norm: 194209.566 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4478/ 159576 | consumed samples: 96816 | elapsed time per iteration (ms): 15527.8 | learning rate: 2.681E-05 | global batch size: 48 | lm loss: 6.499928E+00 | loss scale: 32768.0 | grad norm: 205164.907 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4479/ 159576 | consumed samples: 96864 | elapsed time per iteration (ms): 15497.3 | learning rate: 2.682E-05 | global batch size: 48 | lm loss: 6.333491E+00 | loss scale: 32768.0 | grad norm: 198136.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4480/ 159576 | consumed samples: 96912 | elapsed time per iteration (ms): 15608.5 | learning rate: 2.684E-05 | global batch size: 48 | lm loss: 6.393649E+00 | loss scale: 32768.0 | grad norm: 226765.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4481/ 159576 | consumed samples: 96960 | elapsed time per iteration (ms): 15886.4 | learning rate: 2.685E-05 | global batch size: 48 | lm loss: 6.315465E+00 | loss scale: 32768.0 | grad norm: 233990.065 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4482/ 159576 | consumed samples: 97008 | elapsed time per iteration (ms): 15388.4 | learning rate: 2.686E-05 | global batch size: 48 | lm loss: 6.467194E+00 | loss scale: 32768.0 | grad norm: 253595.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4483/ 159576 | consumed samples: 97056 | elapsed time per iteration (ms): 15452.6 | learning rate: 2.688E-05 | global batch size: 48 | lm loss: 6.424766E+00 | loss scale: 32768.0 | grad norm: 243792.882 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4484/ 159576 | consumed samples: 97104 | elapsed time per iteration (ms): 15440.8 | learning rate: 2.689E-05 | global batch size: 48 | lm loss: 6.382202E+00 | loss scale: 32768.0 | grad norm: 253619.641 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4485/ 159576 | consumed samples: 97152 | elapsed time per iteration (ms): 15758.4 | learning rate: 2.690E-05 | global batch size: 48 | lm loss: 6.420368E+00 | loss scale: 32768.0 | grad norm: 270122.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4486/ 159576 | consumed samples: 97200 | elapsed time per iteration (ms): 15504.2 | learning rate: 2.692E-05 | global batch size: 48 | lm loss: 6.341059E+00 | loss scale: 32768.0 | grad norm: 264076.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4487/ 159576 | consumed samples: 97248 | elapsed time per iteration (ms): 15564.4 | learning rate: 2.693E-05 | global batch size: 48 | lm loss: 6.351835E+00 | loss scale: 32768.0 | grad norm: 254803.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4488/ 159576 | consumed samples: 97296 | elapsed time per iteration (ms): 15603.6 | learning rate: 2.694E-05 | global batch size: 48 | lm loss: 6.344017E+00 | loss scale: 32768.0 | grad norm: 244790.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4489/ 159576 | consumed samples: 97344 | elapsed time per iteration (ms): 15804.2 | learning rate: 2.696E-05 | global batch size: 48 | lm loss: 6.487484E+00 | loss scale: 32768.0 | grad norm: 242539.962 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4490/ 159576 | consumed samples: 97392 | elapsed time per iteration (ms): 15547.3 | learning rate: 2.697E-05 | global batch size: 48 | lm loss: 6.339984E+00 | loss scale: 32768.0 | grad norm: 225575.703 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4491/ 159576 | consumed samples: 97440 | elapsed time per iteration (ms): 15475.7 | learning rate: 2.698E-05 | global batch size: 48 | lm loss: 6.449341E+00 | loss scale: 32768.0 | grad norm: 205395.664 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4492/ 159576 | consumed samples: 97488 | elapsed time per iteration (ms): 15436.0 | learning rate: 2.700E-05 | global batch size: 48 | lm loss: 6.382250E+00 | loss scale: 32768.0 | grad norm: 234078.700 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4493/ 159576 | consumed samples: 97536 | elapsed time per iteration (ms): 15764.8 | learning rate: 2.701E-05 | global batch size: 48 | lm loss: 6.425200E+00 | loss scale: 32768.0 | grad norm: 247476.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4494/ 159576 | consumed samples: 97584 | elapsed time per iteration (ms): 15532.5 | learning rate: 2.702E-05 | global batch size: 48 | lm loss: 6.381852E+00 | loss scale: 32768.0 | grad norm: 242648.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4495/ 159576 | consumed samples: 97632 | elapsed time per iteration (ms): 15533.1 | learning rate: 2.704E-05 | global batch size: 48 | lm loss: 6.230868E+00 | loss scale: 32768.0 | grad norm: 219731.182 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4496/ 159576 | consumed samples: 97680 | elapsed time per iteration (ms): 15535.3 | learning rate: 2.705E-05 | global batch size: 48 | lm loss: 6.353293E+00 | loss scale: 32768.0 | grad norm: 216013.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4497/ 159576 | consumed samples: 97728 | elapsed time per iteration (ms): 15701.4 | learning rate: 2.706E-05 | global batch size: 48 | lm loss: 6.307485E+00 | loss scale: 32768.0 | grad norm: 253204.960 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4498/ 159576 | consumed samples: 97776 | elapsed time per iteration (ms): 15494.7 | learning rate: 2.708E-05 | global batch size: 48 | lm loss: 6.438371E+00 | loss scale: 32768.0 | grad norm: 214787.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4499/ 159576 | consumed samples: 97824 | elapsed time per iteration (ms): 15439.3 | learning rate: 2.709E-05 | global batch size: 48 | lm loss: 6.311467E+00 | loss scale: 32768.0 | grad norm: 216077.744 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4500/ 159576 | consumed samples: 97872 | elapsed time per iteration (ms): 15685.9 | learning rate: 2.710E-05 | global batch size: 48 | lm loss: 6.423208E+00 | loss scale: 32768.0 | grad norm: 207994.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 4500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-24 19:52:27,219] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step4500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 4500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 17750.13 iteration 4501/ 159576 | consumed samples: 97920 | elapsed time per iteration (ms): 33282.4 | learning rate: 2.712E-05 | global batch size: 48 | lm loss: 6.402827E+00 | loss scale: 32768.0 | grad norm: 259915.027 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4502/ 159576 | consumed samples: 97968 | elapsed time per iteration (ms): 15581.1 | learning rate: 2.713E-05 | global batch size: 48 | lm loss: 6.310410E+00 | loss scale: 32768.0 | grad norm: 222384.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4503/ 159576 | consumed samples: 98016 | elapsed time per iteration (ms): 15856.7 | learning rate: 2.714E-05 | global batch size: 48 | lm loss: 6.259107E+00 | loss scale: 32768.0 | grad norm: 219981.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4504/ 159576 | consumed samples: 98064 | elapsed time per iteration (ms): 15522.8 | learning rate: 2.716E-05 | global batch size: 48 | lm loss: 6.441791E+00 | loss scale: 32768.0 | grad norm: 235487.992 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4505/ 159576 | consumed samples: 98112 | elapsed time per iteration (ms): 15475.3 | learning rate: 2.717E-05 | global batch size: 48 | lm loss: 6.431644E+00 | loss scale: 32768.0 | grad norm: 308152.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4506/ 159576 | consumed samples: 98160 | elapsed time per iteration (ms): 15475.2 | learning rate: 2.718E-05 | global batch size: 48 | lm loss: 6.437158E+00 | loss scale: 32768.0 | grad norm: 223087.933 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4507/ 159576 | consumed samples: 98208 | elapsed time per iteration (ms): 15919.3 | learning rate: 2.720E-05 | global batch size: 48 | lm loss: 6.456445E+00 | loss scale: 32768.0 | grad norm: 223422.565 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4508/ 159576 | consumed samples: 98256 | elapsed time per iteration (ms): 15503.1 | learning rate: 2.721E-05 | global batch size: 48 | lm loss: 6.409997E+00 | loss scale: 32768.0 | grad norm: 245785.630 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4509/ 159576 | consumed samples: 98304 | elapsed time per iteration (ms): 15512.1 | learning rate: 2.722E-05 | global batch size: 48 | lm loss: 6.441339E+00 | loss scale: 32768.0 | grad norm: 283619.839 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4510/ 159576 | consumed samples: 98352 | elapsed time per iteration (ms): 15548.0 | learning rate: 2.724E-05 | global batch size: 48 | lm loss: 6.441983E+00 | loss scale: 32768.0 | grad norm: 235037.042 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4511/ 159576 | consumed samples: 98400 | elapsed time per iteration (ms): 15735.6 | learning rate: 2.725E-05 | global batch size: 48 | lm loss: 6.499406E+00 | loss scale: 32768.0 | grad norm: 238925.774 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4512/ 159576 | consumed samples: 98448 | elapsed time per iteration (ms): 15495.6 | learning rate: 2.726E-05 | global batch size: 48 | lm loss: 6.429494E+00 | loss scale: 32768.0 | grad norm: 295604.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4513/ 159576 | consumed samples: 98496 | elapsed time per iteration (ms): 15481.9 | learning rate: 2.728E-05 | global batch size: 48 | lm loss: 6.407839E+00 | loss scale: 32768.0 | grad norm: 292842.531 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4514/ 159576 | consumed samples: 98544 | elapsed time per iteration (ms): 15479.3 | learning rate: 2.729E-05 | global batch size: 48 | lm loss: 6.440022E+00 | loss scale: 32768.0 | grad norm: 270315.805 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4515/ 159576 | consumed samples: 98592 | elapsed time per iteration (ms): 15606.8 | learning rate: 2.730E-05 | global batch size: 48 | lm loss: 6.391658E+00 | loss scale: 32768.0 | grad norm: 271519.155 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4516/ 159576 | consumed samples: 98640 | elapsed time per iteration (ms): 15492.8 | learning rate: 2.732E-05 | global batch size: 48 | lm loss: 6.445361E+00 | loss scale: 32768.0 | grad norm: 235853.751 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4517/ 159576 | consumed samples: 98688 | elapsed time per iteration (ms): 15525.5 | learning rate: 2.733E-05 | global batch size: 48 | lm loss: 6.274318E+00 | loss scale: 32768.0 | grad norm: 246250.889 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4518/ 159576 | consumed samples: 98736 | elapsed time per iteration (ms): 15595.2 | learning rate: 2.734E-05 | global batch size: 48 | lm loss: 6.378585E+00 | loss scale: 32768.0 | grad norm: 262163.945 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4519/ 159576 | consumed samples: 98784 | elapsed time per iteration (ms): 15657.4 | learning rate: 2.736E-05 | global batch size: 48 | lm loss: 6.398365E+00 | loss scale: 32768.0 | grad norm: 339087.705 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4520/ 159576 | consumed samples: 98832 | elapsed time per iteration (ms): 15503.5 | learning rate: 2.737E-05 | global batch size: 48 | lm loss: 6.435692E+00 | loss scale: 32768.0 | grad norm: 219944.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4521/ 159576 | consumed samples: 98880 | elapsed time per iteration (ms): 15444.3 | learning rate: 2.738E-05 | global batch size: 48 | lm loss: 6.418158E+00 | loss scale: 32768.0 | grad norm: 295809.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4522/ 159576 | consumed samples: 98928 | elapsed time per iteration (ms): 15726.5 | learning rate: 2.739E-05 | global batch size: 48 | lm loss: 6.317287E+00 | loss scale: 32768.0 | grad norm: 256139.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4523/ 159576 | consumed samples: 98976 | elapsed time per iteration (ms): 15697.5 | learning rate: 2.741E-05 | global batch size: 48 | lm loss: 6.210083E+00 | loss scale: 32768.0 | grad norm: 222390.085 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4524/ 159576 | consumed samples: 99024 | elapsed time per iteration (ms): 15483.9 | learning rate: 2.742E-05 | global batch size: 48 | lm loss: 6.357608E+00 | loss scale: 32768.0 | grad norm: 250631.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4525/ 159576 | consumed samples: 99072 | elapsed time per iteration (ms): 15498.9 | learning rate: 2.743E-05 | global batch size: 48 | lm loss: 6.439158E+00 | loss scale: 32768.0 | grad norm: 237183.590 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4526/ 159576 | consumed samples: 99120 | elapsed time per iteration (ms): 15870.3 | learning rate: 2.745E-05 | global batch size: 48 | lm loss: 6.477302E+00 | loss scale: 32768.0 | grad norm: 234590.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4527/ 159576 | consumed samples: 99168 | elapsed time per iteration (ms): 15527.5 | learning rate: 2.746E-05 | global batch size: 48 | lm loss: 6.404512E+00 | loss scale: 32768.0 | grad norm: 268737.102 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4528/ 159576 | consumed samples: 99216 | elapsed time per iteration (ms): 15477.7 | learning rate: 2.747E-05 | global batch size: 48 | lm loss: 6.357052E+00 | loss scale: 32768.0 | grad norm: 199055.934 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4529/ 159576 | consumed samples: 99264 | elapsed time per iteration (ms): 15441.0 | learning rate: 2.749E-05 | global batch size: 48 | lm loss: 6.418729E+00 | loss scale: 32768.0 | grad norm: 280337.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4530/ 159576 | consumed samples: 99312 | elapsed time per iteration (ms): 15870.6 | learning rate: 2.750E-05 | global batch size: 48 | lm loss: 6.394526E+00 | loss scale: 32768.0 | grad norm: 242159.812 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4531/ 159576 | consumed samples: 99360 | elapsed time per iteration (ms): 15356.1 | learning rate: 2.751E-05 | global batch size: 48 | lm loss: 6.454551E+00 | loss scale: 32768.0 | grad norm: 238356.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4532/ 159576 | consumed samples: 99408 | elapsed time per iteration (ms): 15481.2 | learning rate: 2.753E-05 | global batch size: 48 | lm loss: 6.479828E+00 | loss scale: 32768.0 | grad norm: 256781.681 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4533/ 159576 | consumed samples: 99456 | elapsed time per iteration (ms): 15512.7 | learning rate: 2.754E-05 | global batch size: 48 | lm loss: 6.347847E+00 | loss scale: 32768.0 | grad norm: 232593.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4534/ 159576 | consumed samples: 99504 | elapsed time per iteration (ms): 16020.6 | learning rate: 2.755E-05 | global batch size: 48 | lm loss: 6.361287E+00 | loss scale: 32768.0 | grad norm: 214859.706 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4535/ 159576 | consumed samples: 99552 | elapsed time per iteration (ms): 15687.2 | learning rate: 2.757E-05 | global batch size: 48 | lm loss: 6.344873E+00 | loss scale: 32768.0 | grad norm: 214653.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4536/ 159576 | consumed samples: 99600 | elapsed time per iteration (ms): 15424.3 | learning rate: 2.758E-05 | global batch size: 48 | lm loss: 6.273855E+00 | loss scale: 32768.0 | grad norm: 249309.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4537/ 159576 | consumed samples: 99648 | elapsed time per iteration (ms): 15440.3 | learning rate: 2.759E-05 | global batch size: 48 | lm loss: 6.373835E+00 | loss scale: 32768.0 | grad norm: 230963.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4538/ 159576 | consumed samples: 99696 | elapsed time per iteration (ms): 15788.5 | learning rate: 2.761E-05 | global batch size: 48 | lm loss: 6.381639E+00 | loss scale: 32768.0 | grad norm: 258586.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4539/ 159576 | consumed samples: 99744 | elapsed time per iteration (ms): 15436.7 | learning rate: 2.762E-05 | global batch size: 48 | lm loss: 6.464207E+00 | loss scale: 32768.0 | grad norm: 260715.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4540/ 159576 | consumed samples: 99792 | elapsed time per iteration (ms): 15631.9 | learning rate: 2.763E-05 | global batch size: 48 | lm loss: 6.282461E+00 | loss scale: 32768.0 | grad norm: 271394.559 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4541/ 159576 | consumed samples: 99840 | elapsed time per iteration (ms): 15417.1 | learning rate: 2.765E-05 | global batch size: 48 | lm loss: 6.323977E+00 | loss scale: 32768.0 | grad norm: 268740.684 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4542/ 159576 | consumed samples: 99888 | elapsed time per iteration (ms): 15726.7 | learning rate: 2.766E-05 | global batch size: 48 | lm loss: 6.419955E+00 | loss scale: 32768.0 | grad norm: 270171.155 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4543/ 159576 | consumed samples: 99936 | elapsed time per iteration (ms): 15524.6 | learning rate: 2.767E-05 | global batch size: 48 | lm loss: 6.456992E+00 | loss scale: 32768.0 | grad norm: 255182.014 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4544/ 159576 | consumed samples: 99984 | elapsed time per iteration (ms): 15442.0 | learning rate: 2.769E-05 | global batch size: 48 | lm loss: 6.327838E+00 | loss scale: 32768.0 | grad norm: 224129.919 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4545/ 159576 | consumed samples: 100032 | elapsed time per iteration (ms): 15419.1 | learning rate: 2.770E-05 | global batch size: 48 | lm loss: 6.374109E+00 | loss scale: 32768.0 | grad norm: 265872.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4546/ 159576 | consumed samples: 100080 | elapsed time per iteration (ms): 15626.3 | learning rate: 2.771E-05 | global batch size: 48 | lm loss: 6.332025E+00 | loss scale: 32768.0 | grad norm: 221965.501 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4547/ 159576 | consumed samples: 100128 | elapsed time per iteration (ms): 15454.8 | learning rate: 2.773E-05 | global batch size: 48 | lm loss: 6.399364E+00 | loss scale: 32768.0 | grad norm: 257839.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4548/ 159576 | consumed samples: 100176 | elapsed time per iteration (ms): 15431.4 | learning rate: 2.774E-05 | global batch size: 48 | lm loss: 6.411947E+00 | loss scale: 32768.0 | grad norm: 278135.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4549/ 159576 | consumed samples: 100224 | elapsed time per iteration (ms): 15844.6 | learning rate: 2.775E-05 | global batch size: 48 | lm loss: 6.477700E+00 | loss scale: 32768.0 | grad norm: 277855.734 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4550/ 159576 | consumed samples: 100272 | elapsed time per iteration (ms): 15537.3 | learning rate: 2.777E-05 | global batch size: 48 | lm loss: 6.526390E+00 | loss scale: 32768.0 | grad norm: 246063.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4551/ 159576 | consumed samples: 100320 | elapsed time per iteration (ms): 15431.5 | learning rate: 2.778E-05 | global batch size: 48 | lm loss: 6.391055E+00 | loss scale: 32768.0 | grad norm: 230174.666 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4552/ 159576 | consumed samples: 100368 | elapsed time per iteration (ms): 15392.1 | learning rate: 2.779E-05 | global batch size: 48 | lm loss: 6.381279E+00 | loss scale: 32768.0 | grad norm: 230427.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4553/ 159576 | consumed samples: 100416 | elapsed time per iteration (ms): 15770.9 | learning rate: 2.781E-05 | global batch size: 48 | lm loss: 6.438869E+00 | loss scale: 32768.0 | grad norm: 230488.615 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4554/ 159576 | consumed samples: 100464 | elapsed time per iteration (ms): 15447.7 | learning rate: 2.782E-05 | global batch size: 48 | lm loss: 6.245214E+00 | loss scale: 32768.0 | grad norm: 277295.687 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4555/ 159576 | consumed samples: 100512 | elapsed time per iteration (ms): 15446.9 | learning rate: 2.783E-05 | global batch size: 48 | lm loss: 6.413427E+00 | loss scale: 32768.0 | grad norm: 223183.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 20:07:07] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 20:07:07] PULSE: tr8-104B is running for 14:14:56 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 4556/ 159576 | consumed samples: 100560 | elapsed time per iteration (ms): 15400.2 | learning rate: 2.785E-05 | global batch size: 48 | lm loss: 6.398170E+00 | loss scale: 32768.0 | grad norm: 233778.721 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4557/ 159576 | consumed samples: 100608 | elapsed time per iteration (ms): 15788.3 | learning rate: 2.786E-05 | global batch size: 48 | lm loss: 6.417650E+00 | loss scale: 32768.0 | grad norm: 311870.109 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4558/ 159576 | consumed samples: 100656 | elapsed time per iteration (ms): 15428.6 | learning rate: 2.787E-05 | global batch size: 48 | lm loss: 6.394480E+00 | loss scale: 32768.0 | grad norm: 234331.495 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4559/ 159576 | consumed samples: 100704 | elapsed time per iteration (ms): 15432.2 | learning rate: 2.789E-05 | global batch size: 48 | lm loss: 6.379920E+00 | loss scale: 32768.0 | grad norm: 256774.134 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4560/ 159576 | consumed samples: 100752 | elapsed time per iteration (ms): 15427.3 | learning rate: 2.790E-05 | global batch size: 48 | lm loss: 6.398593E+00 | loss scale: 32768.0 | grad norm: 244274.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4561/ 159576 | consumed samples: 100800 | elapsed time per iteration (ms): 15906.6 | learning rate: 2.791E-05 | global batch size: 48 | lm loss: 6.370606E+00 | loss scale: 32768.0 | grad norm: 239881.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4562/ 159576 | consumed samples: 100848 | elapsed time per iteration (ms): 15436.7 | learning rate: 2.793E-05 | global batch size: 48 | lm loss: 6.449897E+00 | loss scale: 32768.0 | grad norm: 244189.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4563/ 159576 | consumed samples: 100896 | elapsed time per iteration (ms): 15423.9 | learning rate: 2.794E-05 | global batch size: 48 | lm loss: 6.361297E+00 | loss scale: 32768.0 | grad norm: 214769.520 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4564/ 159576 | consumed samples: 100944 | elapsed time per iteration (ms): 15485.4 | learning rate: 2.795E-05 | global batch size: 48 | lm loss: 6.315623E+00 | loss scale: 32768.0 | grad norm: 238075.723 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4565/ 159576 | consumed samples: 100992 | elapsed time per iteration (ms): 15712.7 | learning rate: 2.797E-05 | global batch size: 48 | lm loss: 6.407779E+00 | loss scale: 32768.0 | grad norm: 219946.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4566/ 159576 | consumed samples: 101040 | elapsed time per iteration (ms): 15450.4 | learning rate: 2.798E-05 | global batch size: 48 | lm loss: 6.417436E+00 | loss scale: 32768.0 | grad norm: 240930.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4567/ 159576 | consumed samples: 101088 | elapsed time per iteration (ms): 15429.7 | learning rate: 2.799E-05 | global batch size: 48 | lm loss: 6.436010E+00 | loss scale: 32768.0 | grad norm: 314077.087 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4568/ 159576 | consumed samples: 101136 | elapsed time per iteration (ms): 15422.9 | learning rate: 2.801E-05 | global batch size: 48 | lm loss: 6.520737E+00 | loss scale: 32768.0 | grad norm: 274297.002 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4569/ 159576 | consumed samples: 101184 | elapsed time per iteration (ms): 15586.4 | learning rate: 2.802E-05 | global batch size: 48 | lm loss: 6.416994E+00 | loss scale: 32768.0 | grad norm: 231703.132 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4570/ 159576 | consumed samples: 101232 | elapsed time per iteration (ms): 15422.0 | learning rate: 2.803E-05 | global batch size: 48 | lm loss: 6.319811E+00 | loss scale: 32768.0 | grad norm: 231530.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4571/ 159576 | consumed samples: 101280 | elapsed time per iteration (ms): 15338.3 | learning rate: 2.805E-05 | global batch size: 48 | lm loss: 6.400026E+00 | loss scale: 32768.0 | grad norm: 257733.850 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4572/ 159576 | consumed samples: 101328 | elapsed time per iteration (ms): 15446.6 | learning rate: 2.806E-05 | global batch size: 48 | lm loss: 6.435762E+00 | loss scale: 32768.0 | grad norm: 268511.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4573/ 159576 | consumed samples: 101376 | elapsed time per iteration (ms): 15589.8 | learning rate: 2.807E-05 | global batch size: 48 | lm loss: 6.406414E+00 | loss scale: 32768.0 | grad norm: 233768.669 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4574/ 159576 | consumed samples: 101424 | elapsed time per iteration (ms): 15349.3 | learning rate: 2.809E-05 | global batch size: 48 | lm loss: 6.437346E+00 | loss scale: 32768.0 | grad norm: 269214.009 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4575/ 159576 | consumed samples: 101472 | elapsed time per iteration (ms): 15388.4 | learning rate: 2.810E-05 | global batch size: 48 | lm loss: 6.352981E+00 | loss scale: 32768.0 | grad norm: 243418.743 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4576/ 159576 | consumed samples: 101520 | elapsed time per iteration (ms): 15469.0 | learning rate: 2.811E-05 | global batch size: 48 | lm loss: 6.355519E+00 | loss scale: 32768.0 | grad norm: 255521.793 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4577/ 159576 | consumed samples: 101568 | elapsed time per iteration (ms): 15986.1 | learning rate: 2.813E-05 | global batch size: 48 | lm loss: 6.380365E+00 | loss scale: 32768.0 | grad norm: 263123.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4578/ 159576 | consumed samples: 101616 | elapsed time per iteration (ms): 15483.5 | learning rate: 2.814E-05 | global batch size: 48 | lm loss: 6.442792E+00 | loss scale: 32768.0 | grad norm: 264664.009 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4579/ 159576 | consumed samples: 101664 | elapsed time per iteration (ms): 15482.0 | learning rate: 2.815E-05 | global batch size: 48 | lm loss: 6.300795E+00 | loss scale: 32768.0 | grad norm: 263093.923 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4580/ 159576 | consumed samples: 101712 | elapsed time per iteration (ms): 15915.5 | learning rate: 2.817E-05 | global batch size: 48 | lm loss: 6.509340E+00 | loss scale: 32768.0 | grad norm: 325066.014 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4581/ 159576 | consumed samples: 101760 | elapsed time per iteration (ms): 15478.8 | learning rate: 2.818E-05 | global batch size: 48 | lm loss: 6.417569E+00 | loss scale: 32768.0 | grad norm: 317932.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4582/ 159576 | consumed samples: 101808 | elapsed time per iteration (ms): 15467.6 | learning rate: 2.819E-05 | global batch size: 48 | lm loss: 6.391977E+00 | loss scale: 32768.0 | grad norm: 265433.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4583/ 159576 | consumed samples: 101856 | elapsed time per iteration (ms): 15463.2 | learning rate: 2.821E-05 | global batch size: 48 | lm loss: 6.493138E+00 | loss scale: 32768.0 | grad norm: 262301.719 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4584/ 159576 | consumed samples: 101904 | elapsed time per iteration (ms): 15787.5 | learning rate: 2.822E-05 | global batch size: 48 | lm loss: 6.358137E+00 | loss scale: 32768.0 | grad norm: 302003.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4585/ 159576 | consumed samples: 101952 | elapsed time per iteration (ms): 15486.8 | learning rate: 2.823E-05 | global batch size: 48 | lm loss: 6.398649E+00 | loss scale: 32768.0 | grad norm: 241427.078 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4586/ 159576 | consumed samples: 102000 | elapsed time per iteration (ms): 15502.1 | learning rate: 2.825E-05 | global batch size: 48 | lm loss: 6.450002E+00 | loss scale: 32768.0 | grad norm: 288231.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4587/ 159576 | consumed samples: 102048 | elapsed time per iteration (ms): 15613.4 | learning rate: 2.826E-05 | global batch size: 48 | lm loss: 6.463566E+00 | loss scale: 32768.0 | grad norm: 255700.156 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4588/ 159576 | consumed samples: 102096 | elapsed time per iteration (ms): 16100.7 | learning rate: 2.827E-05 | global batch size: 48 | lm loss: 6.440113E+00 | loss scale: 32768.0 | grad norm: 228589.163 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4589/ 159576 | consumed samples: 102144 | elapsed time per iteration (ms): 15550.6 | learning rate: 2.829E-05 | global batch size: 48 | lm loss: 6.330764E+00 | loss scale: 32768.0 | grad norm: 253562.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4590/ 159576 | consumed samples: 102192 | elapsed time per iteration (ms): 15504.0 | learning rate: 2.830E-05 | global batch size: 48 | lm loss: 6.565317E+00 | loss scale: 32768.0 | grad norm: 248109.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4591/ 159576 | consumed samples: 102240 | elapsed time per iteration (ms): 15500.8 | learning rate: 2.831E-05 | global batch size: 48 | lm loss: 6.432470E+00 | loss scale: 32768.0 | grad norm: 258408.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4592/ 159576 | consumed samples: 102288 | elapsed time per iteration (ms): 15682.0 | learning rate: 2.833E-05 | global batch size: 48 | lm loss: 6.388723E+00 | loss scale: 32768.0 | grad norm: 255460.696 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4593/ 159576 | consumed samples: 102336 | elapsed time per iteration (ms): 15624.8 | learning rate: 2.834E-05 | global batch size: 48 | lm loss: 6.252523E+00 | loss scale: 32768.0 | grad norm: 247063.847 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4594/ 159576 | consumed samples: 102384 | elapsed time per iteration (ms): 15619.9 | learning rate: 2.835E-05 | global batch size: 48 | lm loss: 6.256584E+00 | loss scale: 32768.0 | grad norm: 252094.746 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4595/ 159576 | consumed samples: 102432 | elapsed time per iteration (ms): 15618.3 | learning rate: 2.837E-05 | global batch size: 48 | lm loss: 6.422144E+00 | loss scale: 32768.0 | grad norm: 327415.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4596/ 159576 | consumed samples: 102480 | elapsed time per iteration (ms): 15731.1 | learning rate: 2.838E-05 | global batch size: 48 | lm loss: 6.362859E+00 | loss scale: 32768.0 | grad norm: 271628.783 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4597/ 159576 | consumed samples: 102528 | elapsed time per iteration (ms): 15470.5 | learning rate: 2.839E-05 | global batch size: 48 | lm loss: 6.400634E+00 | loss scale: 32768.0 | grad norm: 270235.866 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4598/ 159576 | consumed samples: 102576 | elapsed time per iteration (ms): 15494.8 | learning rate: 2.841E-05 | global batch size: 48 | lm loss: 6.409593E+00 | loss scale: 32768.0 | grad norm: 246051.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4599/ 159576 | consumed samples: 102624 | elapsed time per iteration (ms): 15503.4 | learning rate: 2.842E-05 | global batch size: 48 | lm loss: 6.286301E+00 | loss scale: 32768.0 | grad norm: 315951.056 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4600/ 159576 | consumed samples: 102672 | elapsed time per iteration (ms): 15657.8 | learning rate: 2.843E-05 | global batch size: 48 | lm loss: 6.424391E+00 | loss scale: 32768.0 | grad norm: 257970.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4601/ 159576 | consumed samples: 102720 | elapsed time per iteration (ms): 15415.9 | learning rate: 2.845E-05 | global batch size: 48 | lm loss: 6.419086E+00 | loss scale: 32768.0 | grad norm: 232614.820 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4602/ 159576 | consumed samples: 102768 | elapsed time per iteration (ms): 15506.4 | learning rate: 2.846E-05 | global batch size: 48 | lm loss: 6.598701E+00 | loss scale: 32768.0 | grad norm: 269465.797 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4603/ 159576 | consumed samples: 102816 | elapsed time per iteration (ms): 15842.0 | learning rate: 2.847E-05 | global batch size: 48 | lm loss: 6.374152E+00 | loss scale: 32768.0 | grad norm: 256871.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4604/ 159576 | consumed samples: 102864 | elapsed time per iteration (ms): 15661.0 | learning rate: 2.849E-05 | global batch size: 48 | lm loss: 6.330672E+00 | loss scale: 32768.0 | grad norm: 261276.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4605/ 159576 | consumed samples: 102912 | elapsed time per iteration (ms): 15453.1 | learning rate: 2.850E-05 | global batch size: 48 | lm loss: 6.409989E+00 | loss scale: 32768.0 | grad norm: 213427.896 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4606/ 159576 | consumed samples: 102960 | elapsed time per iteration (ms): 15529.1 | learning rate: 2.851E-05 | global batch size: 48 | lm loss: 6.409967E+00 | loss scale: 32768.0 | grad norm: 343079.843 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4607/ 159576 | consumed samples: 103008 | elapsed time per iteration (ms): 15784.9 | learning rate: 2.853E-05 | global batch size: 48 | lm loss: 6.345381E+00 | loss scale: 32768.0 | grad norm: 288014.524 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4608/ 159576 | consumed samples: 103056 | elapsed time per iteration (ms): 15407.4 | learning rate: 2.854E-05 | global batch size: 48 | lm loss: 6.160167E+00 | loss scale: 32768.0 | grad norm: 236948.790 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4609/ 159576 | consumed samples: 103104 | elapsed time per iteration (ms): 15521.9 | learning rate: 2.855E-05 | global batch size: 48 | lm loss: 6.368454E+00 | loss scale: 32768.0 | grad norm: 346716.620 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4610/ 159576 | consumed samples: 103152 | elapsed time per iteration (ms): 15546.6 | learning rate: 2.857E-05 | global batch size: 48 | lm loss: 6.485950E+00 | loss scale: 32768.0 | grad norm: 249193.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4611/ 159576 | consumed samples: 103200 | elapsed time per iteration (ms): 15842.5 | learning rate: 2.858E-05 | global batch size: 48 | lm loss: 6.433112E+00 | loss scale: 32768.0 | grad norm: 245691.542 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4612/ 159576 | consumed samples: 103248 | elapsed time per iteration (ms): 15452.2 | learning rate: 2.859E-05 | global batch size: 48 | lm loss: 6.453573E+00 | loss scale: 32768.0 | grad norm: 326844.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4613/ 159576 | consumed samples: 103296 | elapsed time per iteration (ms): 15454.7 | learning rate: 2.861E-05 | global batch size: 48 | lm loss: 6.431165E+00 | loss scale: 32768.0 | grad norm: 289334.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4614/ 159576 | consumed samples: 103344 | elapsed time per iteration (ms): 15458.5 | learning rate: 2.862E-05 | global batch size: 48 | lm loss: 6.229577E+00 | loss scale: 32768.0 | grad norm: 256574.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4615/ 159576 | consumed samples: 103392 | elapsed time per iteration (ms): 15900.6 | learning rate: 2.863E-05 | global batch size: 48 | lm loss: 6.432065E+00 | loss scale: 32768.0 | grad norm: 273324.041 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4616/ 159576 | consumed samples: 103440 | elapsed time per iteration (ms): 15568.2 | learning rate: 2.865E-05 | global batch size: 48 | lm loss: 6.373868E+00 | loss scale: 32768.0 | grad norm: 289471.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4617/ 159576 | consumed samples: 103488 | elapsed time per iteration (ms): 15491.7 | learning rate: 2.866E-05 | global batch size: 48 | lm loss: 6.302549E+00 | loss scale: 32768.0 | grad norm: 421148.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4618/ 159576 | consumed samples: 103536 | elapsed time per iteration (ms): 15549.9 | learning rate: 2.867E-05 | global batch size: 48 | lm loss: 6.278319E+00 | loss scale: 32768.0 | grad norm: 346570.622 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4619/ 159576 | consumed samples: 103584 | elapsed time per iteration (ms): 15749.4 | learning rate: 2.869E-05 | global batch size: 48 | lm loss: 6.394638E+00 | loss scale: 32768.0 | grad norm: 356110.872 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4620/ 159576 | consumed samples: 103632 | elapsed time per iteration (ms): 15472.2 | learning rate: 2.870E-05 | global batch size: 48 | lm loss: 6.303448E+00 | loss scale: 32768.0 | grad norm: 328724.972 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4621/ 159576 | consumed samples: 103680 | elapsed time per iteration (ms): 15427.3 | learning rate: 2.871E-05 | global batch size: 48 | lm loss: 6.544609E+00 | loss scale: 32768.0 | grad norm: 324100.834 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4622/ 159576 | consumed samples: 103728 | elapsed time per iteration (ms): 15472.5 | learning rate: 2.873E-05 | global batch size: 48 | lm loss: 6.314513E+00 | loss scale: 32768.0 | grad norm: 275878.819 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4623/ 159576 | consumed samples: 103776 | elapsed time per iteration (ms): 15583.2 | learning rate: 2.874E-05 | global batch size: 48 | lm loss: 6.398262E+00 | loss scale: 32768.0 | grad norm: 263126.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4624/ 159576 | consumed samples: 103824 | elapsed time per iteration (ms): 15483.7 | learning rate: 2.875E-05 | global batch size: 48 | lm loss: 6.474843E+00 | loss scale: 32768.0 | grad norm: 242329.963 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4625/ 159576 | consumed samples: 103872 | elapsed time per iteration (ms): 15477.6 | learning rate: 2.877E-05 | global batch size: 48 | lm loss: 6.408014E+00 | loss scale: 32768.0 | grad norm: 267696.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4626/ 159576 | consumed samples: 103920 | elapsed time per iteration (ms): 15516.2 | learning rate: 2.878E-05 | global batch size: 48 | lm loss: 6.847461E+00 | loss scale: 32768.0 | grad norm: 713094.141 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4627/ 159576 | consumed samples: 103968 | elapsed time per iteration (ms): 15724.2 | learning rate: 2.879E-05 | global batch size: 48 | lm loss: 6.386415E+00 | loss scale: 32768.0 | grad norm: 272846.125 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4628/ 159576 | consumed samples: 104016 | elapsed time per iteration (ms): 15456.1 | learning rate: 2.881E-05 | global batch size: 48 | lm loss: 6.446278E+00 | loss scale: 32768.0 | grad norm: 379795.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4629/ 159576 | consumed samples: 104064 | elapsed time per iteration (ms): 15435.5 | learning rate: 2.882E-05 | global batch size: 48 | lm loss: 6.469239E+00 | loss scale: 32768.0 | grad norm: 207715.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4630/ 159576 | consumed samples: 104112 | elapsed time per iteration (ms): 15698.1 | learning rate: 2.883E-05 | global batch size: 48 | lm loss: 6.357453E+00 | loss scale: 32768.0 | grad norm: 236792.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4631/ 159576 | consumed samples: 104160 | elapsed time per iteration (ms): 15489.5 | learning rate: 2.885E-05 | global batch size: 48 | lm loss: 6.448473E+00 | loss scale: 32768.0 | grad norm: 225431.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4632/ 159576 | consumed samples: 104208 | elapsed time per iteration (ms): 15562.5 | learning rate: 2.886E-05 | global batch size: 48 | lm loss: 6.377034E+00 | loss scale: 32768.0 | grad norm: 375353.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4633/ 159576 | consumed samples: 104256 | elapsed time per iteration (ms): 15569.5 | learning rate: 2.887E-05 | global batch size: 48 | lm loss: 6.516908E+00 | loss scale: 32768.0 | grad norm: 333588.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4634/ 159576 | consumed samples: 104304 | elapsed time per iteration (ms): 15928.9 | learning rate: 2.889E-05 | global batch size: 48 | lm loss: 6.574339E+00 | loss scale: 32768.0 | grad norm: 243589.856 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4635/ 159576 | consumed samples: 104352 | elapsed time per iteration (ms): 15531.5 | learning rate: 2.890E-05 | global batch size: 48 | lm loss: 6.475029E+00 | loss scale: 32768.0 | grad norm: 442923.681 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4636/ 159576 | consumed samples: 104400 | elapsed time per iteration (ms): 15560.0 | learning rate: 2.891E-05 | global batch size: 48 | lm loss: 6.369026E+00 | loss scale: 32768.0 | grad norm: 295484.961 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4637/ 159576 | consumed samples: 104448 | elapsed time per iteration (ms): 15543.7 | learning rate: 2.893E-05 | global batch size: 48 | lm loss: 6.490546E+00 | loss scale: 32768.0 | grad norm: 279233.122 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4638/ 159576 | consumed samples: 104496 | elapsed time per iteration (ms): 15916.4 | learning rate: 2.894E-05 | global batch size: 48 | lm loss: 6.437621E+00 | loss scale: 32768.0 | grad norm: 245214.935 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4639/ 159576 | consumed samples: 104544 | elapsed time per iteration (ms): 15547.5 | learning rate: 2.895E-05 | global batch size: 48 | lm loss: 6.491655E+00 | loss scale: 32768.0 | grad norm: 240217.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4640/ 159576 | consumed samples: 104592 | elapsed time per iteration (ms): 15573.7 | learning rate: 2.897E-05 | global batch size: 48 | lm loss: 6.455505E+00 | loss scale: 32768.0 | grad norm: 317400.165 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4641/ 159576 | consumed samples: 104640 | elapsed time per iteration (ms): 15624.7 | learning rate: 2.898E-05 | global batch size: 48 | lm loss: 6.482111E+00 | loss scale: 32768.0 | grad norm: 244102.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4642/ 159576 | consumed samples: 104688 | elapsed time per iteration (ms): 16106.5 | learning rate: 2.899E-05 | global batch size: 48 | lm loss: 6.281504E+00 | loss scale: 32768.0 | grad norm: 282861.527 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4643/ 159576 | consumed samples: 104736 | elapsed time per iteration (ms): 15639.7 | learning rate: 2.901E-05 | global batch size: 48 | lm loss: 6.420715E+00 | loss scale: 32768.0 | grad norm: 274009.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4644/ 159576 | consumed samples: 104784 | elapsed time per iteration (ms): 15520.7 | learning rate: 2.902E-05 | global batch size: 48 | lm loss: 6.342989E+00 | loss scale: 32768.0 | grad norm: 226933.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4645/ 159576 | consumed samples: 104832 | elapsed time per iteration (ms): 15501.6 | learning rate: 2.903E-05 | global batch size: 48 | lm loss: 6.427937E+00 | loss scale: 32768.0 | grad norm: 278047.939 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4646/ 159576 | consumed samples: 104880 | elapsed time per iteration (ms): 15629.3 | learning rate: 2.905E-05 | global batch size: 48 | lm loss: 6.294481E+00 | loss scale: 32768.0 | grad norm: 235356.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4647/ 159576 | consumed samples: 104928 | elapsed time per iteration (ms): 15591.9 | learning rate: 2.906E-05 | global batch size: 48 | lm loss: 6.363388E+00 | loss scale: 32768.0 | grad norm: 600293.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4648/ 159576 | consumed samples: 104976 | elapsed time per iteration (ms): 15595.2 | learning rate: 2.907E-05 | global batch size: 48 | lm loss: 6.377505E+00 | loss scale: 32768.0 | grad norm: 331377.856 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4649/ 159576 | consumed samples: 105024 | elapsed time per iteration (ms): 15628.4 | learning rate: 2.909E-05 | global batch size: 48 | lm loss: 6.381812E+00 | loss scale: 32768.0 | grad norm: 200005.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4650/ 159576 | consumed samples: 105072 | elapsed time per iteration (ms): 15748.7 | learning rate: 2.910E-05 | global batch size: 48 | lm loss: 6.338908E+00 | loss scale: 32768.0 | grad norm: 242913.858 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4651/ 159576 | consumed samples: 105120 | elapsed time per iteration (ms): 15511.3 | learning rate: 2.911E-05 | global batch size: 48 | lm loss: 6.419736E+00 | loss scale: 32768.0 | grad norm: 330409.745 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4652/ 159576 | consumed samples: 105168 | elapsed time per iteration (ms): 15516.3 | learning rate: 2.913E-05 | global batch size: 48 | lm loss: 6.404620E+00 | loss scale: 32768.0 | grad norm: 318144.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4653/ 159576 | consumed samples: 105216 | elapsed time per iteration (ms): 15876.3 | learning rate: 2.914E-05 | global batch size: 48 | lm loss: 6.377990E+00 | loss scale: 32768.0 | grad norm: 232202.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4654/ 159576 | consumed samples: 105264 | elapsed time per iteration (ms): 15718.5 | learning rate: 2.915E-05 | global batch size: 48 | lm loss: 6.383665E+00 | loss scale: 32768.0 | grad norm: 241524.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4655/ 159576 | consumed samples: 105312 | elapsed time per iteration (ms): 15610.4 | learning rate: 2.917E-05 | global batch size: 48 | lm loss: 6.403493E+00 | loss scale: 32768.0 | grad norm: 373231.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4656/ 159576 | consumed samples: 105360 | elapsed time per iteration (ms): 15640.8 | learning rate: 2.918E-05 | global batch size: 48 | lm loss: 6.329133E+00 | loss scale: 32768.0 | grad norm: 286954.758 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4657/ 159576 | consumed samples: 105408 | elapsed time per iteration (ms): 15996.4 | learning rate: 2.919E-05 | global batch size: 48 | lm loss: 6.748344E+00 | loss scale: 32768.0 | grad norm: 260947.100 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4658/ 159576 | consumed samples: 105456 | elapsed time per iteration (ms): 15522.2 | learning rate: 2.921E-05 | global batch size: 48 | lm loss: 6.315388E+00 | loss scale: 32768.0 | grad norm: 279560.800 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4659/ 159576 | consumed samples: 105504 | elapsed time per iteration (ms): 15546.8 | learning rate: 2.922E-05 | global batch size: 48 | lm loss: 6.351707E+00 | loss scale: 32768.0 | grad norm: 270238.544 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4660/ 159576 | consumed samples: 105552 | elapsed time per iteration (ms): 15483.2 | learning rate: 2.923E-05 | global batch size: 48 | lm loss: 6.338678E+00 | loss scale: 32768.0 | grad norm: 299765.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4661/ 159576 | consumed samples: 105600 | elapsed time per iteration (ms): 15828.0 | learning rate: 2.925E-05 | global batch size: 48 | lm loss: 6.427124E+00 | loss scale: 32768.0 | grad norm: 302484.019 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4662/ 159576 | consumed samples: 105648 | elapsed time per iteration (ms): 15644.1 | learning rate: 2.926E-05 | global batch size: 48 | lm loss: 6.407690E+00 | loss scale: 32768.0 | grad norm: 286169.997 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4663/ 159576 | consumed samples: 105696 | elapsed time per iteration (ms): 15583.7 | learning rate: 2.927E-05 | global batch size: 48 | lm loss: 6.254132E+00 | loss scale: 32768.0 | grad norm: 276778.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4664/ 159576 | consumed samples: 105744 | elapsed time per iteration (ms): 15651.6 | learning rate: 2.929E-05 | global batch size: 48 | lm loss: 6.469905E+00 | loss scale: 32768.0 | grad norm: 279741.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4665/ 159576 | consumed samples: 105792 | elapsed time per iteration (ms): 15818.3 | learning rate: 2.930E-05 | global batch size: 48 | lm loss: 6.508596E+00 | loss scale: 32768.0 | grad norm: 336670.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4666/ 159576 | consumed samples: 105840 | elapsed time per iteration (ms): 15552.5 | learning rate: 2.931E-05 | global batch size: 48 | lm loss: 6.434944E+00 | loss scale: 32768.0 | grad norm: 242396.784 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4667/ 159576 | consumed samples: 105888 | elapsed time per iteration (ms): 15512.6 | learning rate: 2.933E-05 | global batch size: 48 | lm loss: 6.510550E+00 | loss scale: 32768.0 | grad norm: 252220.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4668/ 159576 | consumed samples: 105936 | elapsed time per iteration (ms): 15495.7 | learning rate: 2.934E-05 | global batch size: 48 | lm loss: 6.399008E+00 | loss scale: 32768.0 | grad norm: 288495.864 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4669/ 159576 | consumed samples: 105984 | elapsed time per iteration (ms): 15668.5 | learning rate: 2.935E-05 | global batch size: 48 | lm loss: 6.404999E+00 | loss scale: 32768.0 | grad norm: 244327.032 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4670/ 159576 | consumed samples: 106032 | elapsed time per iteration (ms): 15562.9 | learning rate: 2.937E-05 | global batch size: 48 | lm loss: 6.418772E+00 | loss scale: 32768.0 | grad norm: 313672.915 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4671/ 159576 | consumed samples: 106080 | elapsed time per iteration (ms): 15630.7 | learning rate: 2.938E-05 | global batch size: 48 | lm loss: 6.361070E+00 | loss scale: 32768.0 | grad norm: 276763.857 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4672/ 159576 | consumed samples: 106128 | elapsed time per iteration (ms): 15597.8 | learning rate: 2.939E-05 | global batch size: 48 | lm loss: 6.477580E+00 | loss scale: 32768.0 | grad norm: 230503.822 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4673/ 159576 | consumed samples: 106176 | elapsed time per iteration (ms): 15696.4 | learning rate: 2.941E-05 | global batch size: 48 | lm loss: 6.517149E+00 | loss scale: 32768.0 | grad norm: 217937.765 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4674/ 159576 | consumed samples: 106224 | elapsed time per iteration (ms): 15548.7 | learning rate: 2.942E-05 | global batch size: 48 | lm loss: 6.380251E+00 | loss scale: 32768.0 | grad norm: 267703.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4675/ 159576 | consumed samples: 106272 | elapsed time per iteration (ms): 15515.6 | learning rate: 2.943E-05 | global batch size: 48 | lm loss: 6.348250E+00 | loss scale: 32768.0 | grad norm: 309305.174 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4676/ 159576 | consumed samples: 106320 | elapsed time per iteration (ms): 15795.7 | learning rate: 2.945E-05 | global batch size: 48 | lm loss: 6.461040E+00 | loss scale: 32768.0 | grad norm: 285074.708 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4677/ 159576 | consumed samples: 106368 | elapsed time per iteration (ms): 15718.4 | learning rate: 2.946E-05 | global batch size: 48 | lm loss: 6.388801E+00 | loss scale: 32768.0 | grad norm: 292644.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4678/ 159576 | consumed samples: 106416 | elapsed time per iteration (ms): 15585.4 | learning rate: 2.947E-05 | global batch size: 48 | lm loss: 6.417225E+00 | loss scale: 32768.0 | grad norm: 334812.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4679/ 159576 | consumed samples: 106464 | elapsed time per iteration (ms): 15631.1 | learning rate: 2.949E-05 | global batch size: 48 | lm loss: 6.357790E+00 | loss scale: 32768.0 | grad norm: 301017.925 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4680/ 159576 | consumed samples: 106512 | elapsed time per iteration (ms): 15891.7 | learning rate: 2.950E-05 | global batch size: 48 | lm loss: 6.556364E+00 | loss scale: 32768.0 | grad norm: 280065.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4681/ 159576 | consumed samples: 106560 | elapsed time per iteration (ms): 15562.2 | learning rate: 2.951E-05 | global batch size: 48 | lm loss: 6.393982E+00 | loss scale: 32768.0 | grad norm: 242731.164 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4682/ 159576 | consumed samples: 106608 | elapsed time per iteration (ms): 15526.5 | learning rate: 2.953E-05 | global batch size: 48 | lm loss: 6.396220E+00 | loss scale: 32768.0 | grad norm: 407344.753 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4683/ 159576 | consumed samples: 106656 | elapsed time per iteration (ms): 15526.3 | learning rate: 2.954E-05 | global batch size: 48 | lm loss: 6.396249E+00 | loss scale: 32768.0 | grad norm: 300342.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4684/ 159576 | consumed samples: 106704 | elapsed time per iteration (ms): 15885.4 | learning rate: 2.955E-05 | global batch size: 48 | lm loss: 6.375283E+00 | loss scale: 32768.0 | grad norm: 296501.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4685/ 159576 | consumed samples: 106752 | elapsed time per iteration (ms): 15527.4 | learning rate: 2.957E-05 | global batch size: 48 | lm loss: 6.418046E+00 | loss scale: 32768.0 | grad norm: 290100.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4686/ 159576 | consumed samples: 106800 | elapsed time per iteration (ms): 15621.1 | learning rate: 2.958E-05 | global batch size: 48 | lm loss: 6.300463E+00 | loss scale: 32768.0 | grad norm: 265814.471 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4687/ 159576 | consumed samples: 106848 | elapsed time per iteration (ms): 15592.0 | learning rate: 2.959E-05 | global batch size: 48 | lm loss: 6.440179E+00 | loss scale: 32768.0 | grad norm: 354690.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4688/ 159576 | consumed samples: 106896 | elapsed time per iteration (ms): 15963.5 | learning rate: 2.961E-05 | global batch size: 48 | lm loss: 6.396194E+00 | loss scale: 32768.0 | grad norm: 259594.010 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4689/ 159576 | consumed samples: 106944 | elapsed time per iteration (ms): 15540.2 | learning rate: 2.962E-05 | global batch size: 48 | lm loss: 6.459390E+00 | loss scale: 32768.0 | grad norm: 326661.756 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4690/ 159576 | consumed samples: 106992 | elapsed time per iteration (ms): 15512.7 | learning rate: 2.963E-05 | global batch size: 48 | lm loss: 6.324084E+00 | loss scale: 32768.0 | grad norm: 288829.158 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4691/ 159576 | consumed samples: 107040 | elapsed time per iteration (ms): 8709.6 | learning rate: 2.963E-05 | global batch size: 48 | lm loss: 6.781525E+00 | loss scale: 16384.0 | grad norm: 288829.158 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4692/ 159576 | consumed samples: 107088 | elapsed time per iteration (ms): 15305.7 | learning rate: 2.964E-05 | global batch size: 48 | lm loss: 6.431325E+00 | loss scale: 16384.0 | grad norm: 145022.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4693/ 159576 | consumed samples: 107136 | elapsed time per iteration (ms): 15550.9 | learning rate: 2.966E-05 | global batch size: 48 | lm loss: 6.516616E+00 | loss scale: 16384.0 | grad norm: 155613.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4694/ 159576 | consumed samples: 107184 | elapsed time per iteration (ms): 15526.9 | learning rate: 2.967E-05 | global batch size: 48 | lm loss: 6.387960E+00 | loss scale: 16384.0 | grad norm: 134461.471 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4695/ 159576 | consumed samples: 107232 | elapsed time per iteration (ms): 15497.0 | learning rate: 2.968E-05 | global batch size: 48 | lm loss: 6.392653E+00 | loss scale: 16384.0 | grad norm: 141822.076 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4696/ 159576 | consumed samples: 107280 | elapsed time per iteration (ms): 15923.9 | learning rate: 2.970E-05 | global batch size: 48 | lm loss: 6.412030E+00 | loss scale: 16384.0 | grad norm: 175057.651 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4697/ 159576 | consumed samples: 107328 | elapsed time per iteration (ms): 15425.2 | learning rate: 2.971E-05 | global batch size: 48 | lm loss: 6.373864E+00 | loss scale: 16384.0 | grad norm: 282779.549 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4698/ 159576 | consumed samples: 107376 | elapsed time per iteration (ms): 15454.6 | learning rate: 2.972E-05 | global batch size: 48 | lm loss: 6.306759E+00 | loss scale: 16384.0 | grad norm: 136700.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4699/ 159576 | consumed samples: 107424 | elapsed time per iteration (ms): 15528.9 | learning rate: 2.974E-05 | global batch size: 48 | lm loss: 6.335629E+00 | loss scale: 16384.0 | grad norm: 184501.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4700/ 159576 | consumed samples: 107472 | elapsed time per iteration (ms): 15956.8 | learning rate: 2.975E-05 | global batch size: 48 | lm loss: 6.408161E+00 | loss scale: 16384.0 | grad norm: 173148.921 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4701/ 159576 | consumed samples: 107520 | elapsed time per iteration (ms): 15601.2 | learning rate: 2.976E-05 | global batch size: 48 | lm loss: 6.452803E+00 | loss scale: 16384.0 | grad norm: 175212.053 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4702/ 159576 | consumed samples: 107568 | elapsed time per iteration (ms): 15499.9 | learning rate: 2.978E-05 | global batch size: 48 | lm loss: 6.444376E+00 | loss scale: 16384.0 | grad norm: 154484.468 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4703/ 159576 | consumed samples: 107616 | elapsed time per iteration (ms): 15505.8 | learning rate: 2.979E-05 | global batch size: 48 | lm loss: 6.378032E+00 | loss scale: 16384.0 | grad norm: 157853.641 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4704/ 159576 | consumed samples: 107664 | elapsed time per iteration (ms): 15797.2 | learning rate: 2.980E-05 | global batch size: 48 | lm loss: 6.433157E+00 | loss scale: 16384.0 | grad norm: 189038.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4705/ 159576 | consumed samples: 107712 | elapsed time per iteration (ms): 15428.0 | learning rate: 2.982E-05 | global batch size: 48 | lm loss: 6.345381E+00 | loss scale: 16384.0 | grad norm: 223066.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4706/ 159576 | consumed samples: 107760 | elapsed time per iteration (ms): 15506.2 | learning rate: 2.983E-05 | global batch size: 48 | lm loss: 6.409193E+00 | loss scale: 16384.0 | grad norm: 138366.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4707/ 159576 | consumed samples: 107808 | elapsed time per iteration (ms): 15469.9 | learning rate: 2.984E-05 | global batch size: 48 | lm loss: 6.454758E+00 | loss scale: 16384.0 | grad norm: 144072.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4708/ 159576 | consumed samples: 107856 | elapsed time per iteration (ms): 15711.5 | learning rate: 2.986E-05 | global batch size: 48 | lm loss: 6.418115E+00 | loss scale: 16384.0 | grad norm: 160060.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4709/ 159576 | consumed samples: 107904 | elapsed time per iteration (ms): 15549.5 | learning rate: 2.987E-05 | global batch size: 48 | lm loss: 6.323099E+00 | loss scale: 16384.0 | grad norm: 158794.827 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4710/ 159576 | consumed samples: 107952 | elapsed time per iteration (ms): 15458.0 | learning rate: 2.988E-05 | global batch size: 48 | lm loss: 6.418284E+00 | loss scale: 16384.0 | grad norm: 172985.051 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4711/ 159576 | consumed samples: 108000 | elapsed time per iteration (ms): 15477.2 | learning rate: 2.990E-05 | global batch size: 48 | lm loss: 6.449984E+00 | loss scale: 16384.0 | grad norm: 151942.015 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4712/ 159576 | consumed samples: 108048 | elapsed time per iteration (ms): 15912.6 | learning rate: 2.991E-05 | global batch size: 48 | lm loss: 6.331490E+00 | loss scale: 16384.0 | grad norm: 148710.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4713/ 159576 | consumed samples: 108096 | elapsed time per iteration (ms): 15440.5 | learning rate: 2.992E-05 | global batch size: 48 | lm loss: 6.445600E+00 | loss scale: 16384.0 | grad norm: 136119.725 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4714/ 159576 | consumed samples: 108144 | elapsed time per iteration (ms): 15519.8 | learning rate: 2.994E-05 | global batch size: 48 | lm loss: 6.276518E+00 | loss scale: 16384.0 | grad norm: 170811.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4715/ 159576 | consumed samples: 108192 | elapsed time per iteration (ms): 15866.2 | learning rate: 2.995E-05 | global batch size: 48 | lm loss: 6.430917E+00 | loss scale: 16384.0 | grad norm: 145058.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4716/ 159576 | consumed samples: 108240 | elapsed time per iteration (ms): 15520.8 | learning rate: 2.996E-05 | global batch size: 48 | lm loss: 6.459754E+00 | loss scale: 16384.0 | grad norm: 146862.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4717/ 159576 | consumed samples: 108288 | elapsed time per iteration (ms): 15578.0 | learning rate: 2.998E-05 | global batch size: 48 | lm loss: 6.447017E+00 | loss scale: 16384.0 | grad norm: 172505.739 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4718/ 159576 | consumed samples: 108336 | elapsed time per iteration (ms): 15434.8 | learning rate: 2.999E-05 | global batch size: 48 | lm loss: 6.316633E+00 | loss scale: 16384.0 | grad norm: 130149.169 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4719/ 159576 | consumed samples: 108384 | elapsed time per iteration (ms): 15703.7 | learning rate: 3.000E-05 | global batch size: 48 | lm loss: 6.376626E+00 | loss scale: 16384.0 | grad norm: 198273.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4720/ 159576 | consumed samples: 108432 | elapsed time per iteration (ms): 15522.7 | learning rate: 3.002E-05 | global batch size: 48 | lm loss: 6.340569E+00 | loss scale: 16384.0 | grad norm: 189583.946 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4721/ 159576 | consumed samples: 108480 | elapsed time per iteration (ms): 15419.9 | learning rate: 3.003E-05 | global batch size: 48 | lm loss: 6.519832E+00 | loss scale: 16384.0 | grad norm: 148280.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4722/ 159576 | consumed samples: 108528 | elapsed time per iteration (ms): 15537.6 | learning rate: 3.004E-05 | global batch size: 48 | lm loss: 6.519564E+00 | loss scale: 16384.0 | grad norm: 165136.082 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4723/ 159576 | consumed samples: 108576 | elapsed time per iteration (ms): 15984.2 | learning rate: 3.006E-05 | global batch size: 48 | lm loss: 6.331813E+00 | loss scale: 16384.0 | grad norm: 137134.914 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4724/ 159576 | consumed samples: 108624 | elapsed time per iteration (ms): 15591.8 | learning rate: 3.007E-05 | global batch size: 48 | lm loss: 6.417581E+00 | loss scale: 16384.0 | grad norm: 135525.990 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4725/ 159576 | consumed samples: 108672 | elapsed time per iteration (ms): 15458.7 | learning rate: 3.008E-05 | global batch size: 48 | lm loss: 6.369280E+00 | loss scale: 16384.0 | grad norm: 135730.698 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4726/ 159576 | consumed samples: 108720 | elapsed time per iteration (ms): 15476.9 | learning rate: 3.010E-05 | global batch size: 48 | lm loss: 6.320598E+00 | loss scale: 16384.0 | grad norm: 147233.060 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4727/ 159576 | consumed samples: 108768 | elapsed time per iteration (ms): 15812.7 | learning rate: 3.011E-05 | global batch size: 48 | lm loss: 6.469586E+00 | loss scale: 16384.0 | grad norm: 164519.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4728/ 159576 | consumed samples: 108816 | elapsed time per iteration (ms): 15490.9 | learning rate: 3.012E-05 | global batch size: 48 | lm loss: 6.473386E+00 | loss scale: 16384.0 | grad norm: 151619.547 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4729/ 159576 | consumed samples: 108864 | elapsed time per iteration (ms): 15470.7 | learning rate: 3.014E-05 | global batch size: 48 | lm loss: 6.340328E+00 | loss scale: 16384.0 | grad norm: 137036.044 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4730/ 159576 | consumed samples: 108912 | elapsed time per iteration (ms): 15531.2 | learning rate: 3.015E-05 | global batch size: 48 | lm loss: 6.394744E+00 | loss scale: 16384.0 | grad norm: 146186.033 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4731/ 159576 | consumed samples: 108960 | elapsed time per iteration (ms): 15606.4 | learning rate: 3.016E-05 | global batch size: 48 | lm loss: 6.362489E+00 | loss scale: 16384.0 | grad norm: 187444.936 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4732/ 159576 | consumed samples: 109008 | elapsed time per iteration (ms): 15504.3 | learning rate: 3.018E-05 | global batch size: 48 | lm loss: 6.456880E+00 | loss scale: 16384.0 | grad norm: 129595.559 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4733/ 159576 | consumed samples: 109056 | elapsed time per iteration (ms): 15474.7 | learning rate: 3.019E-05 | global batch size: 48 | lm loss: 6.443705E+00 | loss scale: 16384.0 | grad norm: 137176.536 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4734/ 159576 | consumed samples: 109104 | elapsed time per iteration (ms): 15468.7 | learning rate: 3.020E-05 | global batch size: 48 | lm loss: 6.325924E+00 | loss scale: 16384.0 | grad norm: 130886.931 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4735/ 159576 | consumed samples: 109152 | elapsed time per iteration (ms): 15622.9 | learning rate: 3.022E-05 | global batch size: 48 | lm loss: 6.367020E+00 | loss scale: 16384.0 | grad norm: 133365.928 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4736/ 159576 | consumed samples: 109200 | elapsed time per iteration (ms): 15496.0 | learning rate: 3.023E-05 | global batch size: 48 | lm loss: 6.366150E+00 | loss scale: 16384.0 | grad norm: 170880.695 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4737/ 159576 | consumed samples: 109248 | elapsed time per iteration (ms): 15489.1 | learning rate: 3.024E-05 | global batch size: 48 | lm loss: 6.352594E+00 | loss scale: 16384.0 | grad norm: 126383.624 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4738/ 159576 | consumed samples: 109296 | elapsed time per iteration (ms): 15753.5 | learning rate: 3.026E-05 | global batch size: 48 | lm loss: 6.439698E+00 | loss scale: 16384.0 | grad norm: 178764.163 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4739/ 159576 | consumed samples: 109344 | elapsed time per iteration (ms): 15669.9 | learning rate: 3.027E-05 | global batch size: 48 | lm loss: 6.379218E+00 | loss scale: 16384.0 | grad norm: 140248.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4740/ 159576 | consumed samples: 109392 | elapsed time per iteration (ms): 15472.2 | learning rate: 3.028E-05 | global batch size: 48 | lm loss: 6.455700E+00 | loss scale: 16384.0 | grad norm: 141297.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4741/ 159576 | consumed samples: 109440 | elapsed time per iteration (ms): 15470.3 | learning rate: 3.030E-05 | global batch size: 48 | lm loss: 6.395582E+00 | loss scale: 16384.0 | grad norm: 132933.676 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4742/ 159576 | consumed samples: 109488 | elapsed time per iteration (ms): 15846.4 | learning rate: 3.031E-05 | global batch size: 48 | lm loss: 6.391361E+00 | loss scale: 16384.0 | grad norm: 118703.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4743/ 159576 | consumed samples: 109536 | elapsed time per iteration (ms): 15513.5 | learning rate: 3.032E-05 | global batch size: 48 | lm loss: 6.428627E+00 | loss scale: 16384.0 | grad norm: 138048.574 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4744/ 159576 | consumed samples: 109584 | elapsed time per iteration (ms): 15514.2 | learning rate: 3.034E-05 | global batch size: 48 | lm loss: 6.294309E+00 | loss scale: 16384.0 | grad norm: 140003.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4745/ 159576 | consumed samples: 109632 | elapsed time per iteration (ms): 15479.8 | learning rate: 3.035E-05 | global batch size: 48 | lm loss: 6.442544E+00 | loss scale: 16384.0 | grad norm: 137520.854 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4746/ 159576 | consumed samples: 109680 | elapsed time per iteration (ms): 15909.9 | learning rate: 3.036E-05 | global batch size: 48 | lm loss: 6.330937E+00 | loss scale: 16384.0 | grad norm: 133869.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4747/ 159576 | consumed samples: 109728 | elapsed time per iteration (ms): 15438.5 | learning rate: 3.038E-05 | global batch size: 48 | lm loss: 6.375879E+00 | loss scale: 16384.0 | grad norm: 186074.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4748/ 159576 | consumed samples: 109776 | elapsed time per iteration (ms): 15478.1 | learning rate: 3.039E-05 | global batch size: 48 | lm loss: 6.291435E+00 | loss scale: 16384.0 | grad norm: 133042.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4749/ 159576 | consumed samples: 109824 | elapsed time per iteration (ms): 15511.0 | learning rate: 3.040E-05 | global batch size: 48 | lm loss: 6.392264E+00 | loss scale: 16384.0 | grad norm: 142954.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4750/ 159576 | consumed samples: 109872 | elapsed time per iteration (ms): 15876.7 | learning rate: 3.042E-05 | global batch size: 48 | lm loss: 7.872174E+00 | loss scale: 16384.0 | grad norm: 409825.671 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4751/ 159576 | consumed samples: 109920 | elapsed time per iteration (ms): 15539.2 | learning rate: 3.043E-05 | global batch size: 48 | lm loss: 6.478594E+00 | loss scale: 16384.0 | grad norm: 125638.703 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4752/ 159576 | consumed samples: 109968 | elapsed time per iteration (ms): 15507.7 | learning rate: 3.044E-05 | global batch size: 48 | lm loss: 6.357571E+00 | loss scale: 16384.0 | grad norm: 108403.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4753/ 159576 | consumed samples: 110016 | elapsed time per iteration (ms): 15485.4 | learning rate: 3.046E-05 | global batch size: 48 | lm loss: 6.517112E+00 | loss scale: 16384.0 | grad norm: 101971.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4754/ 159576 | consumed samples: 110064 | elapsed time per iteration (ms): 15669.7 | learning rate: 3.047E-05 | global batch size: 48 | lm loss: 6.311660E+00 | loss scale: 16384.0 | grad norm: 117424.161 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4755/ 159576 | consumed samples: 110112 | elapsed time per iteration (ms): 15529.0 | learning rate: 3.048E-05 | global batch size: 48 | lm loss: 6.452873E+00 | loss scale: 16384.0 | grad norm: 153333.779 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4756/ 159576 | consumed samples: 110160 | elapsed time per iteration (ms): 15556.8 | learning rate: 3.050E-05 | global batch size: 48 | lm loss: 6.470776E+00 | loss scale: 16384.0 | grad norm: 123606.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4757/ 159576 | consumed samples: 110208 | elapsed time per iteration (ms): 15535.1 | learning rate: 3.051E-05 | global batch size: 48 | lm loss: 6.444992E+00 | loss scale: 16384.0 | grad norm: 103337.864 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4758/ 159576 | consumed samples: 110256 | elapsed time per iteration (ms): 15670.4 | learning rate: 3.052E-05 | global batch size: 48 | lm loss: 6.402925E+00 | loss scale: 16384.0 | grad norm: 145142.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4759/ 159576 | consumed samples: 110304 | elapsed time per iteration (ms): 15615.8 | learning rate: 3.054E-05 | global batch size: 48 | lm loss: 6.383159E+00 | loss scale: 16384.0 | grad norm: 115666.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4760/ 159576 | consumed samples: 110352 | elapsed time per iteration (ms): 15593.7 | learning rate: 3.055E-05 | global batch size: 48 | lm loss: 6.288662E+00 | loss scale: 16384.0 | grad norm: 125590.923 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4761/ 159576 | consumed samples: 110400 | elapsed time per iteration (ms): 15582.7 | learning rate: 3.056E-05 | global batch size: 48 | lm loss: 6.460382E+00 | loss scale: 16384.0 | grad norm: 131535.871 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4762/ 159576 | consumed samples: 110448 | elapsed time per iteration (ms): 15777.3 | learning rate: 3.058E-05 | global batch size: 48 | lm loss: 6.421331E+00 | loss scale: 16384.0 | grad norm: 123507.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4763/ 159576 | consumed samples: 110496 | elapsed time per iteration (ms): 15542.1 | learning rate: 3.059E-05 | global batch size: 48 | lm loss: 6.471745E+00 | loss scale: 16384.0 | grad norm: 142533.784 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4764/ 159576 | consumed samples: 110544 | elapsed time per iteration (ms): 15505.7 | learning rate: 3.060E-05 | global batch size: 48 | lm loss: 6.437591E+00 | loss scale: 16384.0 | grad norm: 150206.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4765/ 159576 | consumed samples: 110592 | elapsed time per iteration (ms): 15784.9 | learning rate: 3.062E-05 | global batch size: 48 | lm loss: 6.426904E+00 | loss scale: 16384.0 | grad norm: 117533.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4766/ 159576 | consumed samples: 110640 | elapsed time per iteration (ms): 15571.9 | learning rate: 3.063E-05 | global batch size: 48 | lm loss: 6.361554E+00 | loss scale: 16384.0 | grad norm: 125319.029 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4767/ 159576 | consumed samples: 110688 | elapsed time per iteration (ms): 15502.5 | learning rate: 3.064E-05 | global batch size: 48 | lm loss: 6.404096E+00 | loss scale: 16384.0 | grad norm: 137718.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4768/ 159576 | consumed samples: 110736 | elapsed time per iteration (ms): 15543.8 | learning rate: 3.066E-05 | global batch size: 48 | lm loss: 6.437445E+00 | loss scale: 16384.0 | grad norm: 138623.647 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4769/ 159576 | consumed samples: 110784 | elapsed time per iteration (ms): 15859.0 | learning rate: 3.067E-05 | global batch size: 48 | lm loss: 6.395863E+00 | loss scale: 16384.0 | grad norm: 127878.926 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4770/ 159576 | consumed samples: 110832 | elapsed time per iteration (ms): 15536.9 | learning rate: 3.068E-05 | global batch size: 48 | lm loss: 6.561028E+00 | loss scale: 16384.0 | grad norm: 124917.908 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4771/ 159576 | consumed samples: 110880 | elapsed time per iteration (ms): 15506.9 | learning rate: 3.070E-05 | global batch size: 48 | lm loss: 6.471921E+00 | loss scale: 16384.0 | grad norm: 161855.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4772/ 159576 | consumed samples: 110928 | elapsed time per iteration (ms): 15469.5 | learning rate: 3.071E-05 | global batch size: 48 | lm loss: 6.442107E+00 | loss scale: 16384.0 | grad norm: 174619.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4773/ 159576 | consumed samples: 110976 | elapsed time per iteration (ms): 15874.3 | learning rate: 3.072E-05 | global batch size: 48 | lm loss: 6.450697E+00 | loss scale: 16384.0 | grad norm: 128857.784 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4774/ 159576 | consumed samples: 111024 | elapsed time per iteration (ms): 15476.2 | learning rate: 3.074E-05 | global batch size: 48 | lm loss: 6.409184E+00 | loss scale: 16384.0 | grad norm: 167963.478 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4775/ 159576 | consumed samples: 111072 | elapsed time per iteration (ms): 15524.6 | learning rate: 3.075E-05 | global batch size: 48 | lm loss: 6.521546E+00 | loss scale: 16384.0 | grad norm: 160789.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4776/ 159576 | consumed samples: 111120 | elapsed time per iteration (ms): 15522.1 | learning rate: 3.076E-05 | global batch size: 48 | lm loss: 6.392659E+00 | loss scale: 16384.0 | grad norm: 144341.782 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4777/ 159576 | consumed samples: 111168 | elapsed time per iteration (ms): 15807.4 | learning rate: 3.078E-05 | global batch size: 48 | lm loss: 6.295141E+00 | loss scale: 16384.0 | grad norm: 127243.790 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4778/ 159576 | consumed samples: 111216 | elapsed time per iteration (ms): 15569.3 | learning rate: 3.079E-05 | global batch size: 48 | lm loss: 6.327214E+00 | loss scale: 16384.0 | grad norm: 126284.160 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4779/ 159576 | consumed samples: 111264 | elapsed time per iteration (ms): 15403.5 | learning rate: 3.080E-05 | global batch size: 48 | lm loss: 6.573749E+00 | loss scale: 16384.0 | grad norm: 122918.062 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4780/ 159576 | consumed samples: 111312 | elapsed time per iteration (ms): 15381.1 | learning rate: 3.082E-05 | global batch size: 48 | lm loss: 6.433424E+00 | loss scale: 16384.0 | grad norm: 124694.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4781/ 159576 | consumed samples: 111360 | elapsed time per iteration (ms): 15664.5 | learning rate: 3.083E-05 | global batch size: 48 | lm loss: 6.469074E+00 | loss scale: 16384.0 | grad norm: 147526.104 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4782/ 159576 | consumed samples: 111408 | elapsed time per iteration (ms): 15406.6 | learning rate: 3.084E-05 | global batch size: 48 | lm loss: 6.349575E+00 | loss scale: 16384.0 | grad norm: 124417.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4783/ 159576 | consumed samples: 111456 | elapsed time per iteration (ms): 15497.8 | learning rate: 3.086E-05 | global batch size: 48 | lm loss: 6.254411E+00 | loss scale: 16384.0 | grad norm: 132978.536 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4784/ 159576 | consumed samples: 111504 | elapsed time per iteration (ms): 15491.3 | learning rate: 3.087E-05 | global batch size: 48 | lm loss: 6.407672E+00 | loss scale: 16384.0 | grad norm: 136226.752 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4785/ 159576 | consumed samples: 111552 | elapsed time per iteration (ms): 15585.7 | learning rate: 3.088E-05 | global batch size: 48 | lm loss: 6.340271E+00 | loss scale: 16384.0 | grad norm: 143403.987 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4786/ 159576 | consumed samples: 111600 | elapsed time per iteration (ms): 15420.5 | learning rate: 3.090E-05 | global batch size: 48 | lm loss: 6.532565E+00 | loss scale: 16384.0 | grad norm: 146250.150 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4787/ 159576 | consumed samples: 111648 | elapsed time per iteration (ms): 15350.6 | learning rate: 3.091E-05 | global batch size: 48 | lm loss: 6.387796E+00 | loss scale: 16384.0 | grad norm: 219665.169 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 21:07:15] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 21:07:15] PULSE: tr8-104B is running for 15:15:04 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 4788/ 159576 | consumed samples: 111696 | elapsed time per iteration (ms): 15408.2 | learning rate: 3.092E-05 | global batch size: 48 | lm loss: 6.385682E+00 | loss scale: 16384.0 | grad norm: 135205.771 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4789/ 159576 | consumed samples: 111744 | elapsed time per iteration (ms): 15723.0 | learning rate: 3.094E-05 | global batch size: 48 | lm loss: 6.382418E+00 | loss scale: 16384.0 | grad norm: 135775.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4790/ 159576 | consumed samples: 111792 | elapsed time per iteration (ms): 15412.3 | learning rate: 3.095E-05 | global batch size: 48 | lm loss: 6.349115E+00 | loss scale: 16384.0 | grad norm: 161890.935 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4791/ 159576 | consumed samples: 111840 | elapsed time per iteration (ms): 15444.3 | learning rate: 3.096E-05 | global batch size: 48 | lm loss: 6.551302E+00 | loss scale: 16384.0 | grad norm: 160659.721 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4792/ 159576 | consumed samples: 111888 | elapsed time per iteration (ms): 15819.0 | learning rate: 3.098E-05 | global batch size: 48 | lm loss: 6.439594E+00 | loss scale: 16384.0 | grad norm: 133779.922 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4793/ 159576 | consumed samples: 111936 | elapsed time per iteration (ms): 15566.2 | learning rate: 3.099E-05 | global batch size: 48 | lm loss: 6.469571E+00 | loss scale: 16384.0 | grad norm: 134021.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4794/ 159576 | consumed samples: 111984 | elapsed time per iteration (ms): 15417.1 | learning rate: 3.100E-05 | global batch size: 48 | lm loss: 6.302731E+00 | loss scale: 16384.0 | grad norm: 144273.145 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4795/ 159576 | consumed samples: 112032 | elapsed time per iteration (ms): 15348.6 | learning rate: 3.102E-05 | global batch size: 48 | lm loss: 6.524598E+00 | loss scale: 16384.0 | grad norm: 173531.750 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4796/ 159576 | consumed samples: 112080 | elapsed time per iteration (ms): 15687.5 | learning rate: 3.103E-05 | global batch size: 48 | lm loss: 6.379292E+00 | loss scale: 16384.0 | grad norm: 135799.927 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4797/ 159576 | consumed samples: 112128 | elapsed time per iteration (ms): 15525.4 | learning rate: 3.104E-05 | global batch size: 48 | lm loss: 6.363866E+00 | loss scale: 16384.0 | grad norm: 157197.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4798/ 159576 | consumed samples: 112176 | elapsed time per iteration (ms): 15407.8 | learning rate: 3.106E-05 | global batch size: 48 | lm loss: 6.301018E+00 | loss scale: 16384.0 | grad norm: 157927.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4799/ 159576 | consumed samples: 112224 | elapsed time per iteration (ms): 15420.4 | learning rate: 3.107E-05 | global batch size: 48 | lm loss: 6.529522E+00 | loss scale: 16384.0 | grad norm: 161359.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4800/ 159576 | consumed samples: 112272 | elapsed time per iteration (ms): 15797.9 | learning rate: 3.108E-05 | global batch size: 48 | lm loss: 6.347914E+00 | loss scale: 16384.0 | grad norm: 147972.460 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4801/ 159576 | consumed samples: 112320 | elapsed time per iteration (ms): 15327.2 | learning rate: 3.110E-05 | global batch size: 48 | lm loss: 6.375738E+00 | loss scale: 16384.0 | grad norm: 153820.838 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4802/ 159576 | consumed samples: 112368 | elapsed time per iteration (ms): 15430.2 | learning rate: 3.111E-05 | global batch size: 48 | lm loss: 6.380699E+00 | loss scale: 16384.0 | grad norm: 200141.688 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4803/ 159576 | consumed samples: 112416 | elapsed time per iteration (ms): 15437.0 | learning rate: 3.112E-05 | global batch size: 48 | lm loss: 6.346474E+00 | loss scale: 16384.0 | grad norm: 150956.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4804/ 159576 | consumed samples: 112464 | elapsed time per iteration (ms): 15932.7 | learning rate: 3.114E-05 | global batch size: 48 | lm loss: 6.424392E+00 | loss scale: 16384.0 | grad norm: 144387.858 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4805/ 159576 | consumed samples: 112512 | elapsed time per iteration (ms): 15535.0 | learning rate: 3.115E-05 | global batch size: 48 | lm loss: 6.327216E+00 | loss scale: 16384.0 | grad norm: 145981.007 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4806/ 159576 | consumed samples: 112560 | elapsed time per iteration (ms): 15433.8 | learning rate: 3.116E-05 | global batch size: 48 | lm loss: 6.352614E+00 | loss scale: 16384.0 | grad norm: 159012.654 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4807/ 159576 | consumed samples: 112608 | elapsed time per iteration (ms): 15389.4 | learning rate: 3.118E-05 | global batch size: 48 | lm loss: 6.523698E+00 | loss scale: 16384.0 | grad norm: 183142.813 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4808/ 159576 | consumed samples: 112656 | elapsed time per iteration (ms): 15811.1 | learning rate: 3.119E-05 | global batch size: 48 | lm loss: 6.425416E+00 | loss scale: 16384.0 | grad norm: 158356.721 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4809/ 159576 | consumed samples: 112704 | elapsed time per iteration (ms): 15390.9 | learning rate: 3.120E-05 | global batch size: 48 | lm loss: 6.460537E+00 | loss scale: 16384.0 | grad norm: 160752.580 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4810/ 159576 | consumed samples: 112752 | elapsed time per iteration (ms): 15403.0 | learning rate: 3.122E-05 | global batch size: 48 | lm loss: 6.358703E+00 | loss scale: 16384.0 | grad norm: 136445.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4811/ 159576 | consumed samples: 112800 | elapsed time per iteration (ms): 15361.3 | learning rate: 3.123E-05 | global batch size: 48 | lm loss: 6.445686E+00 | loss scale: 16384.0 | grad norm: 150287.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4812/ 159576 | consumed samples: 112848 | elapsed time per iteration (ms): 15635.2 | learning rate: 3.124E-05 | global batch size: 48 | lm loss: 6.351339E+00 | loss scale: 16384.0 | grad norm: 127746.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4813/ 159576 | consumed samples: 112896 | elapsed time per iteration (ms): 15458.8 | learning rate: 3.126E-05 | global batch size: 48 | lm loss: 6.509888E+00 | loss scale: 16384.0 | grad norm: 142135.548 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4814/ 159576 | consumed samples: 112944 | elapsed time per iteration (ms): 15373.2 | learning rate: 3.127E-05 | global batch size: 48 | lm loss: 6.393768E+00 | loss scale: 16384.0 | grad norm: 140003.150 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4815/ 159576 | consumed samples: 112992 | elapsed time per iteration (ms): 15438.1 | learning rate: 3.128E-05 | global batch size: 48 | lm loss: 6.501161E+00 | loss scale: 16384.0 | grad norm: 148857.005 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4816/ 159576 | consumed samples: 113040 | elapsed time per iteration (ms): 15632.8 | learning rate: 3.130E-05 | global batch size: 48 | lm loss: 6.330061E+00 | loss scale: 16384.0 | grad norm: 147693.703 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4817/ 159576 | consumed samples: 113088 | elapsed time per iteration (ms): 15360.6 | learning rate: 3.131E-05 | global batch size: 48 | lm loss: 6.405270E+00 | loss scale: 16384.0 | grad norm: 135039.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4818/ 159576 | consumed samples: 113136 | elapsed time per iteration (ms): 15427.5 | learning rate: 3.132E-05 | global batch size: 48 | lm loss: 6.376327E+00 | loss scale: 16384.0 | grad norm: 144860.784 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4819/ 159576 | consumed samples: 113184 | elapsed time per iteration (ms): 15402.3 | learning rate: 3.134E-05 | global batch size: 48 | lm loss: 6.422782E+00 | loss scale: 16384.0 | grad norm: 185430.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4820/ 159576 | consumed samples: 113232 | elapsed time per iteration (ms): 15872.7 | learning rate: 3.135E-05 | global batch size: 48 | lm loss: 6.447948E+00 | loss scale: 16384.0 | grad norm: 143563.779 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4821/ 159576 | consumed samples: 113280 | elapsed time per iteration (ms): 15475.0 | learning rate: 3.136E-05 | global batch size: 48 | lm loss: 6.419926E+00 | loss scale: 16384.0 | grad norm: 139618.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4822/ 159576 | consumed samples: 113328 | elapsed time per iteration (ms): 15479.8 | learning rate: 3.138E-05 | global batch size: 48 | lm loss: 6.307784E+00 | loss scale: 16384.0 | grad norm: 135923.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4823/ 159576 | consumed samples: 113376 | elapsed time per iteration (ms): 15830.9 | learning rate: 3.139E-05 | global batch size: 48 | lm loss: 6.485186E+00 | loss scale: 16384.0 | grad norm: 148878.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4824/ 159576 | consumed samples: 113424 | elapsed time per iteration (ms): 15412.5 | learning rate: 3.140E-05 | global batch size: 48 | lm loss: 6.344635E+00 | loss scale: 16384.0 | grad norm: 144634.532 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4825/ 159576 | consumed samples: 113472 | elapsed time per iteration (ms): 15399.2 | learning rate: 3.142E-05 | global batch size: 48 | lm loss: 6.380017E+00 | loss scale: 16384.0 | grad norm: 149087.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4826/ 159576 | consumed samples: 113520 | elapsed time per iteration (ms): 15495.5 | learning rate: 3.143E-05 | global batch size: 48 | lm loss: 6.478100E+00 | loss scale: 16384.0 | grad norm: 157916.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4827/ 159576 | consumed samples: 113568 | elapsed time per iteration (ms): 15748.7 | learning rate: 3.144E-05 | global batch size: 48 | lm loss: 6.353170E+00 | loss scale: 16384.0 | grad norm: 130626.129 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4828/ 159576 | consumed samples: 113616 | elapsed time per iteration (ms): 15356.7 | learning rate: 3.146E-05 | global batch size: 48 | lm loss: 6.307143E+00 | loss scale: 16384.0 | grad norm: 152222.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4829/ 159576 | consumed samples: 113664 | elapsed time per iteration (ms): 15426.2 | learning rate: 3.147E-05 | global batch size: 48 | lm loss: 6.284460E+00 | loss scale: 16384.0 | grad norm: 135151.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4830/ 159576 | consumed samples: 113712 | elapsed time per iteration (ms): 15453.2 | learning rate: 3.148E-05 | global batch size: 48 | lm loss: 6.389065E+00 | loss scale: 16384.0 | grad norm: 158822.080 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4831/ 159576 | consumed samples: 113760 | elapsed time per iteration (ms): 15757.8 | learning rate: 3.150E-05 | global batch size: 48 | lm loss: 6.330949E+00 | loss scale: 16384.0 | grad norm: 150077.176 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4832/ 159576 | consumed samples: 113808 | elapsed time per iteration (ms): 8582.4 | learning rate: 3.150E-05 | global batch size: 48 | lm loss: 6.330990E+00 | loss scale: 8192.0 | grad norm: 150077.176 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4833/ 159576 | consumed samples: 113856 | elapsed time per iteration (ms): 14858.8 | learning rate: 3.151E-05 | global batch size: 48 | lm loss: 6.472740E+00 | loss scale: 8192.0 | grad norm: 80806.673 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4834/ 159576 | consumed samples: 113904 | elapsed time per iteration (ms): 15406.5 | learning rate: 3.152E-05 | global batch size: 48 | lm loss: 6.386261E+00 | loss scale: 8192.0 | grad norm: 79982.750 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4835/ 159576 | consumed samples: 113952 | elapsed time per iteration (ms): 15754.6 | learning rate: 3.154E-05 | global batch size: 48 | lm loss: 6.399200E+00 | loss scale: 8192.0 | grad norm: 76427.802 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4836/ 159576 | consumed samples: 114000 | elapsed time per iteration (ms): 15606.6 | learning rate: 3.155E-05 | global batch size: 48 | lm loss: 6.377688E+00 | loss scale: 8192.0 | grad norm: 72730.651 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4837/ 159576 | consumed samples: 114048 | elapsed time per iteration (ms): 15427.9 | learning rate: 3.156E-05 | global batch size: 48 | lm loss: 6.362796E+00 | loss scale: 8192.0 | grad norm: 75031.879 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4838/ 159576 | consumed samples: 114096 | elapsed time per iteration (ms): 15459.9 | learning rate: 3.158E-05 | global batch size: 48 | lm loss: 6.427638E+00 | loss scale: 8192.0 | grad norm: 71627.109 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4839/ 159576 | consumed samples: 114144 | elapsed time per iteration (ms): 15785.4 | learning rate: 3.159E-05 | global batch size: 48 | lm loss: 6.319674E+00 | loss scale: 8192.0 | grad norm: 75857.181 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4840/ 159576 | consumed samples: 114192 | elapsed time per iteration (ms): 15529.1 | learning rate: 3.160E-05 | global batch size: 48 | lm loss: 6.453057E+00 | loss scale: 8192.0 | grad norm: 81110.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4841/ 159576 | consumed samples: 114240 | elapsed time per iteration (ms): 15426.5 | learning rate: 3.162E-05 | global batch size: 48 | lm loss: 6.411851E+00 | loss scale: 8192.0 | grad norm: 86983.700 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4842/ 159576 | consumed samples: 114288 | elapsed time per iteration (ms): 15460.5 | learning rate: 3.163E-05 | global batch size: 48 | lm loss: 6.377954E+00 | loss scale: 8192.0 | grad norm: 86981.542 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4843/ 159576 | consumed samples: 114336 | elapsed time per iteration (ms): 15821.2 | learning rate: 3.164E-05 | global batch size: 48 | lm loss: 6.577933E+00 | loss scale: 8192.0 | grad norm: 91346.895 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4844/ 159576 | consumed samples: 114384 | elapsed time per iteration (ms): 15501.1 | learning rate: 3.166E-05 | global batch size: 48 | lm loss: 6.404775E+00 | loss scale: 8192.0 | grad norm: 73191.069 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4845/ 159576 | consumed samples: 114432 | elapsed time per iteration (ms): 15559.3 | learning rate: 3.167E-05 | global batch size: 48 | lm loss: 6.405911E+00 | loss scale: 8192.0 | grad norm: 77252.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4846/ 159576 | consumed samples: 114480 | elapsed time per iteration (ms): 15521.7 | learning rate: 3.168E-05 | global batch size: 48 | lm loss: 6.505279E+00 | loss scale: 8192.0 | grad norm: 70335.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4847/ 159576 | consumed samples: 114528 | elapsed time per iteration (ms): 15925.0 | learning rate: 3.170E-05 | global batch size: 48 | lm loss: 6.438465E+00 | loss scale: 8192.0 | grad norm: 73213.704 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4848/ 159576 | consumed samples: 114576 | elapsed time per iteration (ms): 15612.2 | learning rate: 3.171E-05 | global batch size: 48 | lm loss: 6.452498E+00 | loss scale: 8192.0 | grad norm: 78502.943 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4849/ 159576 | consumed samples: 114624 | elapsed time per iteration (ms): 15443.4 | learning rate: 3.172E-05 | global batch size: 48 | lm loss: 6.394375E+00 | loss scale: 8192.0 | grad norm: 87781.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4850/ 159576 | consumed samples: 114672 | elapsed time per iteration (ms): 15479.4 | learning rate: 3.174E-05 | global batch size: 48 | lm loss: 6.435881E+00 | loss scale: 8192.0 | grad norm: 73932.494 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4851/ 159576 | consumed samples: 114720 | elapsed time per iteration (ms): 15706.9 | learning rate: 3.175E-05 | global batch size: 48 | lm loss: 6.482435E+00 | loss scale: 8192.0 | grad norm: 80407.010 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4852/ 159576 | consumed samples: 114768 | elapsed time per iteration (ms): 15526.6 | learning rate: 3.176E-05 | global batch size: 48 | lm loss: 6.479346E+00 | loss scale: 8192.0 | grad norm: 88804.640 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4853/ 159576 | consumed samples: 114816 | elapsed time per iteration (ms): 15581.7 | learning rate: 3.178E-05 | global batch size: 48 | lm loss: 6.398011E+00 | loss scale: 8192.0 | grad norm: 85238.079 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4854/ 159576 | consumed samples: 114864 | elapsed time per iteration (ms): 15591.6 | learning rate: 3.179E-05 | global batch size: 48 | lm loss: 6.439957E+00 | loss scale: 8192.0 | grad norm: 79088.978 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4855/ 159576 | consumed samples: 114912 | elapsed time per iteration (ms): 15588.2 | learning rate: 3.180E-05 | global batch size: 48 | lm loss: 6.525852E+00 | loss scale: 8192.0 | grad norm: 86759.095 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4856/ 159576 | consumed samples: 114960 | elapsed time per iteration (ms): 15491.8 | learning rate: 3.182E-05 | global batch size: 48 | lm loss: 6.406517E+00 | loss scale: 8192.0 | grad norm: 84644.761 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4857/ 159576 | consumed samples: 115008 | elapsed time per iteration (ms): 15455.8 | learning rate: 3.183E-05 | global batch size: 48 | lm loss: 6.427845E+00 | loss scale: 8192.0 | grad norm: 95490.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4858/ 159576 | consumed samples: 115056 | elapsed time per iteration (ms): 15508.2 | learning rate: 3.184E-05 | global batch size: 48 | lm loss: 6.500411E+00 | loss scale: 8192.0 | grad norm: 101236.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4859/ 159576 | consumed samples: 115104 | elapsed time per iteration (ms): 15652.7 | learning rate: 3.186E-05 | global batch size: 48 | lm loss: 6.364994E+00 | loss scale: 8192.0 | grad norm: 91582.098 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4860/ 159576 | consumed samples: 115152 | elapsed time per iteration (ms): 15517.9 | learning rate: 3.187E-05 | global batch size: 48 | lm loss: 6.449871E+00 | loss scale: 8192.0 | grad norm: 66096.086 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4861/ 159576 | consumed samples: 115200 | elapsed time per iteration (ms): 15569.1 | learning rate: 3.188E-05 | global batch size: 48 | lm loss: 6.364583E+00 | loss scale: 8192.0 | grad norm: 83574.580 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4862/ 159576 | consumed samples: 115248 | elapsed time per iteration (ms): 15872.9 | learning rate: 3.189E-05 | global batch size: 48 | lm loss: 6.322206E+00 | loss scale: 8192.0 | grad norm: 76576.722 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4863/ 159576 | consumed samples: 115296 | elapsed time per iteration (ms): 15519.6 | learning rate: 3.191E-05 | global batch size: 48 | lm loss: 6.475718E+00 | loss scale: 8192.0 | grad norm: 68002.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4864/ 159576 | consumed samples: 115344 | elapsed time per iteration (ms): 15516.6 | learning rate: 3.192E-05 | global batch size: 48 | lm loss: 6.312770E+00 | loss scale: 8192.0 | grad norm: 83359.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4865/ 159576 | consumed samples: 115392 | elapsed time per iteration (ms): 15489.9 | learning rate: 3.193E-05 | global batch size: 48 | lm loss: 6.447346E+00 | loss scale: 8192.0 | grad norm: 79898.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4866/ 159576 | consumed samples: 115440 | elapsed time per iteration (ms): 15854.0 | learning rate: 3.195E-05 | global batch size: 48 | lm loss: 6.343767E+00 | loss scale: 8192.0 | grad norm: 82915.939 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4867/ 159576 | consumed samples: 115488 | elapsed time per iteration (ms): 15538.2 | learning rate: 3.196E-05 | global batch size: 48 | lm loss: 6.421945E+00 | loss scale: 8192.0 | grad norm: 76629.129 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4868/ 159576 | consumed samples: 115536 | elapsed time per iteration (ms): 15524.2 | learning rate: 3.197E-05 | global batch size: 48 | lm loss: 6.402726E+00 | loss scale: 8192.0 | grad norm: 75429.794 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4869/ 159576 | consumed samples: 115584 | elapsed time per iteration (ms): 15553.9 | learning rate: 3.199E-05 | global batch size: 48 | lm loss: 6.417988E+00 | loss scale: 8192.0 | grad norm: 82790.972 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4870/ 159576 | consumed samples: 115632 | elapsed time per iteration (ms): 15916.9 | learning rate: 3.200E-05 | global batch size: 48 | lm loss: 6.289523E+00 | loss scale: 8192.0 | grad norm: 77156.616 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4871/ 159576 | consumed samples: 115680 | elapsed time per iteration (ms): 15548.8 | learning rate: 3.201E-05 | global batch size: 48 | lm loss: 6.359477E+00 | loss scale: 8192.0 | grad norm: 94063.189 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4872/ 159576 | consumed samples: 115728 | elapsed time per iteration (ms): 15482.5 | learning rate: 3.203E-05 | global batch size: 48 | lm loss: 6.386482E+00 | loss scale: 8192.0 | grad norm: 70658.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4873/ 159576 | consumed samples: 115776 | elapsed time per iteration (ms): 15555.0 | learning rate: 3.204E-05 | global batch size: 48 | lm loss: 6.524825E+00 | loss scale: 8192.0 | grad norm: 86322.654 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4874/ 159576 | consumed samples: 115824 | elapsed time per iteration (ms): 15950.6 | learning rate: 3.205E-05 | global batch size: 48 | lm loss: 6.358710E+00 | loss scale: 8192.0 | grad norm: 73619.690 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4875/ 159576 | consumed samples: 115872 | elapsed time per iteration (ms): 15559.5 | learning rate: 3.207E-05 | global batch size: 48 | lm loss: 6.536497E+00 | loss scale: 8192.0 | grad norm: 89786.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4876/ 159576 | consumed samples: 115920 | elapsed time per iteration (ms): 15463.5 | learning rate: 3.208E-05 | global batch size: 48 | lm loss: 6.427877E+00 | loss scale: 8192.0 | grad norm: 78839.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4877/ 159576 | consumed samples: 115968 | elapsed time per iteration (ms): 15525.4 | learning rate: 3.209E-05 | global batch size: 48 | lm loss: 6.471958E+00 | loss scale: 8192.0 | grad norm: 76472.776 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4878/ 159576 | consumed samples: 116016 | elapsed time per iteration (ms): 15732.8 | learning rate: 3.211E-05 | global batch size: 48 | lm loss: 6.437389E+00 | loss scale: 8192.0 | grad norm: 86320.939 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4879/ 159576 | consumed samples: 116064 | elapsed time per iteration (ms): 15464.9 | learning rate: 3.212E-05 | global batch size: 48 | lm loss: 6.365283E+00 | loss scale: 8192.0 | grad norm: 82080.986 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4880/ 159576 | consumed samples: 116112 | elapsed time per iteration (ms): 15552.2 | learning rate: 3.213E-05 | global batch size: 48 | lm loss: 6.408097E+00 | loss scale: 8192.0 | grad norm: 79728.972 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4881/ 159576 | consumed samples: 116160 | elapsed time per iteration (ms): 15532.2 | learning rate: 3.215E-05 | global batch size: 48 | lm loss: 6.425485E+00 | loss scale: 8192.0 | grad norm: 102265.159 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4882/ 159576 | consumed samples: 116208 | elapsed time per iteration (ms): 15707.7 | learning rate: 3.216E-05 | global batch size: 48 | lm loss: 6.276470E+00 | loss scale: 8192.0 | grad norm: 93438.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4883/ 159576 | consumed samples: 116256 | elapsed time per iteration (ms): 15592.8 | learning rate: 3.217E-05 | global batch size: 48 | lm loss: 6.487882E+00 | loss scale: 8192.0 | grad norm: 85760.044 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4884/ 159576 | consumed samples: 116304 | elapsed time per iteration (ms): 15486.2 | learning rate: 3.219E-05 | global batch size: 48 | lm loss: 6.412776E+00 | loss scale: 8192.0 | grad norm: 84281.777 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4885/ 159576 | consumed samples: 116352 | elapsed time per iteration (ms): 15807.2 | learning rate: 3.220E-05 | global batch size: 48 | lm loss: 6.340213E+00 | loss scale: 8192.0 | grad norm: 79000.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4886/ 159576 | consumed samples: 116400 | elapsed time per iteration (ms): 15690.6 | learning rate: 3.221E-05 | global batch size: 48 | lm loss: 6.368945E+00 | loss scale: 8192.0 | grad norm: 101421.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4887/ 159576 | consumed samples: 116448 | elapsed time per iteration (ms): 15490.9 | learning rate: 3.223E-05 | global batch size: 48 | lm loss: 6.181931E+00 | loss scale: 8192.0 | grad norm: 80306.960 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4888/ 159576 | consumed samples: 116496 | elapsed time per iteration (ms): 15541.0 | learning rate: 3.224E-05 | global batch size: 48 | lm loss: 6.508174E+00 | loss scale: 8192.0 | grad norm: 88863.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4889/ 159576 | consumed samples: 116544 | elapsed time per iteration (ms): 15795.9 | learning rate: 3.225E-05 | global batch size: 48 | lm loss: 6.362309E+00 | loss scale: 8192.0 | grad norm: 82730.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4890/ 159576 | consumed samples: 116592 | elapsed time per iteration (ms): 15612.5 | learning rate: 3.227E-05 | global batch size: 48 | lm loss: 6.457442E+00 | loss scale: 8192.0 | grad norm: 77751.832 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4891/ 159576 | consumed samples: 116640 | elapsed time per iteration (ms): 15523.7 | learning rate: 3.228E-05 | global batch size: 48 | lm loss: 6.382168E+00 | loss scale: 8192.0 | grad norm: 95335.147 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4892/ 159576 | consumed samples: 116688 | elapsed time per iteration (ms): 15565.3 | learning rate: 3.229E-05 | global batch size: 48 | lm loss: 6.443634E+00 | loss scale: 8192.0 | grad norm: 141532.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4893/ 159576 | consumed samples: 116736 | elapsed time per iteration (ms): 15920.8 | learning rate: 3.231E-05 | global batch size: 48 | lm loss: 6.475467E+00 | loss scale: 8192.0 | grad norm: 99006.769 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4894/ 159576 | consumed samples: 116784 | elapsed time per iteration (ms): 15438.9 | learning rate: 3.232E-05 | global batch size: 48 | lm loss: 6.465964E+00 | loss scale: 8192.0 | grad norm: 104819.919 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4895/ 159576 | consumed samples: 116832 | elapsed time per iteration (ms): 15486.6 | learning rate: 3.233E-05 | global batch size: 48 | lm loss: 6.355396E+00 | loss scale: 8192.0 | grad norm: 88645.070 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4896/ 159576 | consumed samples: 116880 | elapsed time per iteration (ms): 15530.2 | learning rate: 3.235E-05 | global batch size: 48 | lm loss: 6.397956E+00 | loss scale: 8192.0 | grad norm: 97080.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4897/ 159576 | consumed samples: 116928 | elapsed time per iteration (ms): 15972.1 | learning rate: 3.236E-05 | global batch size: 48 | lm loss: 6.376213E+00 | loss scale: 8192.0 | grad norm: 91571.932 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4898/ 159576 | consumed samples: 116976 | elapsed time per iteration (ms): 15582.4 | learning rate: 3.237E-05 | global batch size: 48 | lm loss: 6.338162E+00 | loss scale: 8192.0 | grad norm: 95029.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4899/ 159576 | consumed samples: 117024 | elapsed time per iteration (ms): 15514.7 | learning rate: 3.239E-05 | global batch size: 48 | lm loss: 6.420194E+00 | loss scale: 8192.0 | grad norm: 115966.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4900/ 159576 | consumed samples: 117072 | elapsed time per iteration (ms): 15492.3 | learning rate: 3.240E-05 | global batch size: 48 | lm loss: 6.472268E+00 | loss scale: 8192.0 | grad norm: 117112.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4901/ 159576 | consumed samples: 117120 | elapsed time per iteration (ms): 15707.8 | learning rate: 3.241E-05 | global batch size: 48 | lm loss: 6.365590E+00 | loss scale: 8192.0 | grad norm: 126111.497 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4902/ 159576 | consumed samples: 117168 | elapsed time per iteration (ms): 15440.6 | learning rate: 3.243E-05 | global batch size: 48 | lm loss: 6.341323E+00 | loss scale: 8192.0 | grad norm: 141040.178 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4903/ 159576 | consumed samples: 117216 | elapsed time per iteration (ms): 15486.6 | learning rate: 3.244E-05 | global batch size: 48 | lm loss: 6.294356E+00 | loss scale: 8192.0 | grad norm: 92893.758 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4904/ 159576 | consumed samples: 117264 | elapsed time per iteration (ms): 15374.1 | learning rate: 3.245E-05 | global batch size: 48 | lm loss: 6.459288E+00 | loss scale: 8192.0 | grad norm: 105593.680 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4905/ 159576 | consumed samples: 117312 | elapsed time per iteration (ms): 15525.3 | learning rate: 3.247E-05 | global batch size: 48 | lm loss: 6.321597E+00 | loss scale: 8192.0 | grad norm: 92345.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4906/ 159576 | consumed samples: 117360 | elapsed time per iteration (ms): 15464.1 | learning rate: 3.248E-05 | global batch size: 48 | lm loss: 6.394690E+00 | loss scale: 8192.0 | grad norm: 115046.817 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4907/ 159576 | consumed samples: 117408 | elapsed time per iteration (ms): 15463.2 | learning rate: 3.249E-05 | global batch size: 48 | lm loss: 6.382209E+00 | loss scale: 8192.0 | grad norm: 129712.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4908/ 159576 | consumed samples: 117456 | elapsed time per iteration (ms): 15513.8 | learning rate: 3.251E-05 | global batch size: 48 | lm loss: 6.406621E+00 | loss scale: 8192.0 | grad norm: 97342.857 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4909/ 159576 | consumed samples: 117504 | elapsed time per iteration (ms): 15695.2 | learning rate: 3.252E-05 | global batch size: 48 | lm loss: 6.313143E+00 | loss scale: 8192.0 | grad norm: 113026.841 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4910/ 159576 | consumed samples: 117552 | elapsed time per iteration (ms): 15443.0 | learning rate: 3.253E-05 | global batch size: 48 | lm loss: 6.450486E+00 | loss scale: 8192.0 | grad norm: 95063.553 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4911/ 159576 | consumed samples: 117600 | elapsed time per iteration (ms): 15416.6 | learning rate: 3.255E-05 | global batch size: 48 | lm loss: 6.485876E+00 | loss scale: 8192.0 | grad norm: 102064.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4912/ 159576 | consumed samples: 117648 | elapsed time per iteration (ms): 15823.7 | learning rate: 3.256E-05 | global batch size: 48 | lm loss: 6.276315E+00 | loss scale: 8192.0 | grad norm: 114959.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4913/ 159576 | consumed samples: 117696 | elapsed time per iteration (ms): 15625.5 | learning rate: 3.257E-05 | global batch size: 48 | lm loss: 6.405933E+00 | loss scale: 8192.0 | grad norm: 117232.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4914/ 159576 | consumed samples: 117744 | elapsed time per iteration (ms): 15455.3 | learning rate: 3.259E-05 | global batch size: 48 | lm loss: 6.233083E+00 | loss scale: 8192.0 | grad norm: 109853.141 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4915/ 159576 | consumed samples: 117792 | elapsed time per iteration (ms): 15594.3 | learning rate: 3.260E-05 | global batch size: 48 | lm loss: 6.418136E+00 | loss scale: 8192.0 | grad norm: 108180.861 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4916/ 159576 | consumed samples: 117840 | elapsed time per iteration (ms): 15954.3 | learning rate: 3.261E-05 | global batch size: 48 | lm loss: 6.385183E+00 | loss scale: 8192.0 | grad norm: 103614.011 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4917/ 159576 | consumed samples: 117888 | elapsed time per iteration (ms): 15458.8 | learning rate: 3.263E-05 | global batch size: 48 | lm loss: 6.341071E+00 | loss scale: 8192.0 | grad norm: 87833.153 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4918/ 159576 | consumed samples: 117936 | elapsed time per iteration (ms): 15501.3 | learning rate: 3.264E-05 | global batch size: 48 | lm loss: 6.418250E+00 | loss scale: 8192.0 | grad norm: 91681.912 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4919/ 159576 | consumed samples: 117984 | elapsed time per iteration (ms): 15446.3 | learning rate: 3.265E-05 | global batch size: 48 | lm loss: 6.298886E+00 | loss scale: 8192.0 | grad norm: 98048.635 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4920/ 159576 | consumed samples: 118032 | elapsed time per iteration (ms): 15905.0 | learning rate: 3.267E-05 | global batch size: 48 | lm loss: 6.413123E+00 | loss scale: 8192.0 | grad norm: 103541.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4921/ 159576 | consumed samples: 118080 | elapsed time per iteration (ms): 15416.1 | learning rate: 3.268E-05 | global batch size: 48 | lm loss: 6.282074E+00 | loss scale: 8192.0 | grad norm: 100452.621 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4922/ 159576 | consumed samples: 118128 | elapsed time per iteration (ms): 15499.9 | learning rate: 3.269E-05 | global batch size: 48 | lm loss: 6.371088E+00 | loss scale: 8192.0 | grad norm: 118401.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4923/ 159576 | consumed samples: 118176 | elapsed time per iteration (ms): 15522.6 | learning rate: 3.271E-05 | global batch size: 48 | lm loss: 6.399379E+00 | loss scale: 8192.0 | grad norm: 100877.549 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4924/ 159576 | consumed samples: 118224 | elapsed time per iteration (ms): 15859.1 | learning rate: 3.272E-05 | global batch size: 48 | lm loss: 6.450886E+00 | loss scale: 8192.0 | grad norm: 115997.698 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4925/ 159576 | consumed samples: 118272 | elapsed time per iteration (ms): 15622.0 | learning rate: 3.273E-05 | global batch size: 48 | lm loss: 6.412412E+00 | loss scale: 8192.0 | grad norm: 121229.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4926/ 159576 | consumed samples: 118320 | elapsed time per iteration (ms): 15522.5 | learning rate: 3.275E-05 | global batch size: 48 | lm loss: 6.276751E+00 | loss scale: 8192.0 | grad norm: 127323.029 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4927/ 159576 | consumed samples: 118368 | elapsed time per iteration (ms): 15489.0 | learning rate: 3.276E-05 | global batch size: 48 | lm loss: 6.328137E+00 | loss scale: 8192.0 | grad norm: 109231.572 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4928/ 159576 | consumed samples: 118416 | elapsed time per iteration (ms): 15679.3 | learning rate: 3.277E-05 | global batch size: 48 | lm loss: 6.343997E+00 | loss scale: 8192.0 | grad norm: 94463.087 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4929/ 159576 | consumed samples: 118464 | elapsed time per iteration (ms): 15506.4 | learning rate: 3.279E-05 | global batch size: 48 | lm loss: 6.367960E+00 | loss scale: 8192.0 | grad norm: 104644.038 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4930/ 159576 | consumed samples: 118512 | elapsed time per iteration (ms): 15552.6 | learning rate: 3.280E-05 | global batch size: 48 | lm loss: 6.375040E+00 | loss scale: 8192.0 | grad norm: 108080.731 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4931/ 159576 | consumed samples: 118560 | elapsed time per iteration (ms): 15566.6 | learning rate: 3.281E-05 | global batch size: 48 | lm loss: 6.468022E+00 | loss scale: 8192.0 | grad norm: 98813.039 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4932/ 159576 | consumed samples: 118608 | elapsed time per iteration (ms): 15633.8 | learning rate: 3.283E-05 | global batch size: 48 | lm loss: 6.478949E+00 | loss scale: 8192.0 | grad norm: 119522.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4933/ 159576 | consumed samples: 118656 | elapsed time per iteration (ms): 15451.3 | learning rate: 3.284E-05 | global batch size: 48 | lm loss: 6.415487E+00 | loss scale: 8192.0 | grad norm: 121029.519 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4934/ 159576 | consumed samples: 118704 | elapsed time per iteration (ms): 15537.9 | learning rate: 3.285E-05 | global batch size: 48 | lm loss: 6.436414E+00 | loss scale: 8192.0 | grad norm: 114108.101 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4935/ 159576 | consumed samples: 118752 | elapsed time per iteration (ms): 15442.4 | learning rate: 3.287E-05 | global batch size: 48 | lm loss: 6.380546E+00 | loss scale: 8192.0 | grad norm: 102153.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4936/ 159576 | consumed samples: 118800 | elapsed time per iteration (ms): 15674.3 | learning rate: 3.288E-05 | global batch size: 48 | lm loss: 6.524636E+00 | loss scale: 8192.0 | grad norm: 89702.742 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4937/ 159576 | consumed samples: 118848 | elapsed time per iteration (ms): 15501.6 | learning rate: 3.289E-05 | global batch size: 48 | lm loss: 6.352899E+00 | loss scale: 8192.0 | grad norm: 106241.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4938/ 159576 | consumed samples: 118896 | elapsed time per iteration (ms): 15494.9 | learning rate: 3.291E-05 | global batch size: 48 | lm loss: 6.292633E+00 | loss scale: 8192.0 | grad norm: 95129.966 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4939/ 159576 | consumed samples: 118944 | elapsed time per iteration (ms): 15936.8 | learning rate: 3.292E-05 | global batch size: 48 | lm loss: 6.337314E+00 | loss scale: 8192.0 | grad norm: 120723.828 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4940/ 159576 | consumed samples: 118992 | elapsed time per iteration (ms): 15531.1 | learning rate: 3.293E-05 | global batch size: 48 | lm loss: 6.391080E+00 | loss scale: 8192.0 | grad norm: 145548.804 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4941/ 159576 | consumed samples: 119040 | elapsed time per iteration (ms): 15466.0 | learning rate: 3.295E-05 | global batch size: 48 | lm loss: 6.343481E+00 | loss scale: 8192.0 | grad norm: 211104.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4942/ 159576 | consumed samples: 119088 | elapsed time per iteration (ms): 15505.4 | learning rate: 3.296E-05 | global batch size: 48 | lm loss: 6.528688E+00 | loss scale: 8192.0 | grad norm: 140909.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4943/ 159576 | consumed samples: 119136 | elapsed time per iteration (ms): 15830.2 | learning rate: 3.297E-05 | global batch size: 48 | lm loss: 6.411016E+00 | loss scale: 8192.0 | grad norm: 127370.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4944/ 159576 | consumed samples: 119184 | elapsed time per iteration (ms): 15400.2 | learning rate: 3.299E-05 | global batch size: 48 | lm loss: 6.483131E+00 | loss scale: 8192.0 | grad norm: 104651.898 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4945/ 159576 | consumed samples: 119232 | elapsed time per iteration (ms): 15491.5 | learning rate: 3.300E-05 | global batch size: 48 | lm loss: 6.509373E+00 | loss scale: 8192.0 | grad norm: 129067.934 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4946/ 159576 | consumed samples: 119280 | elapsed time per iteration (ms): 15557.0 | learning rate: 3.301E-05 | global batch size: 48 | lm loss: 6.338033E+00 | loss scale: 8192.0 | grad norm: 111737.692 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4947/ 159576 | consumed samples: 119328 | elapsed time per iteration (ms): 15880.4 | learning rate: 3.303E-05 | global batch size: 48 | lm loss: 6.346412E+00 | loss scale: 8192.0 | grad norm: 105173.160 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4948/ 159576 | consumed samples: 119376 | elapsed time per iteration (ms): 15470.3 | learning rate: 3.304E-05 | global batch size: 48 | lm loss: 6.433241E+00 | loss scale: 8192.0 | grad norm: 117253.932 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4949/ 159576 | consumed samples: 119424 | elapsed time per iteration (ms): 15464.0 | learning rate: 3.305E-05 | global batch size: 48 | lm loss: 6.408391E+00 | loss scale: 8192.0 | grad norm: 100408.960 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4950/ 159576 | consumed samples: 119472 | elapsed time per iteration (ms): 15498.5 | learning rate: 3.307E-05 | global batch size: 48 | lm loss: 6.403716E+00 | loss scale: 8192.0 | grad norm: 124240.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4951/ 159576 | consumed samples: 119520 | elapsed time per iteration (ms): 15815.9 | learning rate: 3.308E-05 | global batch size: 48 | lm loss: 6.389519E+00 | loss scale: 8192.0 | grad norm: 100463.890 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4952/ 159576 | consumed samples: 119568 | elapsed time per iteration (ms): 15557.3 | learning rate: 3.309E-05 | global batch size: 48 | lm loss: 6.505785E+00 | loss scale: 8192.0 | grad norm: 106487.068 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4953/ 159576 | consumed samples: 119616 | elapsed time per iteration (ms): 15479.5 | learning rate: 3.311E-05 | global batch size: 48 | lm loss: 6.381755E+00 | loss scale: 8192.0 | grad norm: 102228.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4954/ 159576 | consumed samples: 119664 | elapsed time per iteration (ms): 15481.8 | learning rate: 3.312E-05 | global batch size: 48 | lm loss: 6.379836E+00 | loss scale: 8192.0 | grad norm: 118394.733 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4955/ 159576 | consumed samples: 119712 | elapsed time per iteration (ms): 15784.5 | learning rate: 3.313E-05 | global batch size: 48 | lm loss: 6.475849E+00 | loss scale: 8192.0 | grad norm: 122087.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4956/ 159576 | consumed samples: 119760 | elapsed time per iteration (ms): 15436.2 | learning rate: 3.315E-05 | global batch size: 48 | lm loss: 6.490977E+00 | loss scale: 8192.0 | grad norm: 123577.161 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4957/ 159576 | consumed samples: 119808 | elapsed time per iteration (ms): 15420.1 | learning rate: 3.316E-05 | global batch size: 48 | lm loss: 6.418243E+00 | loss scale: 8192.0 | grad norm: 146260.906 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4958/ 159576 | consumed samples: 119856 | elapsed time per iteration (ms): 15433.1 | learning rate: 3.317E-05 | global batch size: 48 | lm loss: 6.375823E+00 | loss scale: 8192.0 | grad norm: 102943.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4959/ 159576 | consumed samples: 119904 | elapsed time per iteration (ms): 15549.7 | learning rate: 3.319E-05 | global batch size: 48 | lm loss: 6.454865E+00 | loss scale: 8192.0 | grad norm: 95733.097 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4960/ 159576 | consumed samples: 119952 | elapsed time per iteration (ms): 15477.0 | learning rate: 3.320E-05 | global batch size: 48 | lm loss: 6.376845E+00 | loss scale: 8192.0 | grad norm: 105409.137 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4961/ 159576 | consumed samples: 120000 | elapsed time per iteration (ms): 15553.6 | learning rate: 3.321E-05 | global batch size: 48 | lm loss: 6.369764E+00 | loss scale: 8192.0 | grad norm: 100426.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4962/ 159576 | consumed samples: 120048 | elapsed time per iteration (ms): 15567.9 | learning rate: 3.323E-05 | global batch size: 48 | lm loss: 6.386555E+00 | loss scale: 8192.0 | grad norm: 100112.758 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4963/ 159576 | consumed samples: 120096 | elapsed time per iteration (ms): 15733.5 | learning rate: 3.324E-05 | global batch size: 48 | lm loss: 6.487816E+00 | loss scale: 8192.0 | grad norm: 117343.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4964/ 159576 | consumed samples: 120144 | elapsed time per iteration (ms): 15368.5 | learning rate: 3.325E-05 | global batch size: 48 | lm loss: 6.415962E+00 | loss scale: 8192.0 | grad norm: 98866.878 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4965/ 159576 | consumed samples: 120192 | elapsed time per iteration (ms): 15477.1 | learning rate: 3.327E-05 | global batch size: 48 | lm loss: 6.374081E+00 | loss scale: 8192.0 | grad norm: 124767.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4966/ 159576 | consumed samples: 120240 | elapsed time per iteration (ms): 15922.3 | learning rate: 3.328E-05 | global batch size: 48 | lm loss: 6.338925E+00 | loss scale: 8192.0 | grad norm: 229637.846 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4967/ 159576 | consumed samples: 120288 | elapsed time per iteration (ms): 15438.9 | learning rate: 3.329E-05 | global batch size: 48 | lm loss: 6.318257E+00 | loss scale: 8192.0 | grad norm: 138618.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4968/ 159576 | consumed samples: 120336 | elapsed time per iteration (ms): 15527.5 | learning rate: 3.331E-05 | global batch size: 48 | lm loss: 6.406815E+00 | loss scale: 8192.0 | grad norm: 101628.651 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4969/ 159576 | consumed samples: 120384 | elapsed time per iteration (ms): 15565.4 | learning rate: 3.332E-05 | global batch size: 48 | lm loss: 6.381866E+00 | loss scale: 8192.0 | grad norm: 138150.093 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4970/ 159576 | consumed samples: 120432 | elapsed time per iteration (ms): 15898.0 | learning rate: 3.333E-05 | global batch size: 48 | lm loss: 6.305198E+00 | loss scale: 8192.0 | grad norm: 94133.912 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4971/ 159576 | consumed samples: 120480 | elapsed time per iteration (ms): 15413.4 | learning rate: 3.335E-05 | global batch size: 48 | lm loss: 6.276737E+00 | loss scale: 8192.0 | grad norm: 89212.813 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4972/ 159576 | consumed samples: 120528 | elapsed time per iteration (ms): 15553.0 | learning rate: 3.336E-05 | global batch size: 48 | lm loss: 6.404760E+00 | loss scale: 8192.0 | grad norm: 119702.116 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4973/ 159576 | consumed samples: 120576 | elapsed time per iteration (ms): 15428.6 | learning rate: 3.337E-05 | global batch size: 48 | lm loss: 6.225817E+00 | loss scale: 8192.0 | grad norm: 94382.038 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4974/ 159576 | consumed samples: 120624 | elapsed time per iteration (ms): 15767.2 | learning rate: 3.339E-05 | global batch size: 48 | lm loss: 6.442757E+00 | loss scale: 8192.0 | grad norm: 99692.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4975/ 159576 | consumed samples: 120672 | elapsed time per iteration (ms): 15514.4 | learning rate: 3.340E-05 | global batch size: 48 | lm loss: 6.472607E+00 | loss scale: 8192.0 | grad norm: 112543.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4976/ 159576 | consumed samples: 120720 | elapsed time per iteration (ms): 15602.8 | learning rate: 3.341E-05 | global batch size: 48 | lm loss: 6.382205E+00 | loss scale: 8192.0 | grad norm: 97309.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4977/ 159576 | consumed samples: 120768 | elapsed time per iteration (ms): 15584.4 | learning rate: 3.343E-05 | global batch size: 48 | lm loss: 6.527099E+00 | loss scale: 8192.0 | grad norm: 91482.780 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4978/ 159576 | consumed samples: 120816 | elapsed time per iteration (ms): 15753.9 | learning rate: 3.344E-05 | global batch size: 48 | lm loss: 6.475079E+00 | loss scale: 8192.0 | grad norm: 167594.086 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4979/ 159576 | consumed samples: 120864 | elapsed time per iteration (ms): 15592.8 | learning rate: 3.345E-05 | global batch size: 48 | lm loss: 6.377496E+00 | loss scale: 8192.0 | grad norm: 94710.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4980/ 159576 | consumed samples: 120912 | elapsed time per iteration (ms): 15439.6 | learning rate: 3.347E-05 | global batch size: 48 | lm loss: 6.396212E+00 | loss scale: 8192.0 | grad norm: 82226.776 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4981/ 159576 | consumed samples: 120960 | elapsed time per iteration (ms): 15453.4 | learning rate: 3.348E-05 | global batch size: 48 | lm loss: 6.392390E+00 | loss scale: 8192.0 | grad norm: 93532.515 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4982/ 159576 | consumed samples: 121008 | elapsed time per iteration (ms): 15623.6 | learning rate: 3.349E-05 | global batch size: 48 | lm loss: 6.384733E+00 | loss scale: 8192.0 | grad norm: 99819.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4983/ 159576 | consumed samples: 121056 | elapsed time per iteration (ms): 15476.4 | learning rate: 3.351E-05 | global batch size: 48 | lm loss: 6.365707E+00 | loss scale: 8192.0 | grad norm: 115195.515 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4984/ 159576 | consumed samples: 121104 | elapsed time per iteration (ms): 15519.9 | learning rate: 3.352E-05 | global batch size: 48 | lm loss: 6.280232E+00 | loss scale: 8192.0 | grad norm: 88569.976 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4985/ 159576 | consumed samples: 121152 | elapsed time per iteration (ms): 15489.3 | learning rate: 3.353E-05 | global batch size: 48 | lm loss: 6.514761E+00 | loss scale: 8192.0 | grad norm: 110101.646 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4986/ 159576 | consumed samples: 121200 | elapsed time per iteration (ms): 15582.9 | learning rate: 3.355E-05 | global batch size: 48 | lm loss: 6.394022E+00 | loss scale: 8192.0 | grad norm: 104900.137 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4987/ 159576 | consumed samples: 121248 | elapsed time per iteration (ms): 15478.8 | learning rate: 3.356E-05 | global batch size: 48 | lm loss: 6.428993E+00 | loss scale: 8192.0 | grad norm: 99980.054 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4988/ 159576 | consumed samples: 121296 | elapsed time per iteration (ms): 15470.8 | learning rate: 3.357E-05 | global batch size: 48 | lm loss: 6.383337E+00 | loss scale: 8192.0 | grad norm: 96150.673 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4989/ 159576 | consumed samples: 121344 | elapsed time per iteration (ms): 15490.7 | learning rate: 3.359E-05 | global batch size: 48 | lm loss: 6.440140E+00 | loss scale: 8192.0 | grad norm: 99225.792 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4990/ 159576 | consumed samples: 121392 | elapsed time per iteration (ms): 16022.8 | learning rate: 3.360E-05 | global batch size: 48 | lm loss: 6.329103E+00 | loss scale: 8192.0 | grad norm: 77357.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4991/ 159576 | consumed samples: 121440 | elapsed time per iteration (ms): 15500.7 | learning rate: 3.361E-05 | global batch size: 48 | lm loss: 6.346808E+00 | loss scale: 8192.0 | grad norm: 83379.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4992/ 159576 | consumed samples: 121488 | elapsed time per iteration (ms): 15638.6 | learning rate: 3.363E-05 | global batch size: 48 | lm loss: 6.460890E+00 | loss scale: 8192.0 | grad norm: 114878.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4993/ 159576 | consumed samples: 121536 | elapsed time per iteration (ms): 15882.0 | learning rate: 3.364E-05 | global batch size: 48 | lm loss: 6.485402E+00 | loss scale: 8192.0 | grad norm: 164153.089 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4994/ 159576 | consumed samples: 121584 | elapsed time per iteration (ms): 15543.1 | learning rate: 3.365E-05 | global batch size: 48 | lm loss: 6.511444E+00 | loss scale: 8192.0 | grad norm: 102365.809 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4995/ 159576 | consumed samples: 121632 | elapsed time per iteration (ms): 15538.2 | learning rate: 3.367E-05 | global batch size: 48 | lm loss: 6.413379E+00 | loss scale: 8192.0 | grad norm: 115181.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4996/ 159576 | consumed samples: 121680 | elapsed time per iteration (ms): 15508.4 | learning rate: 3.368E-05 | global batch size: 48 | lm loss: 6.359092E+00 | loss scale: 8192.0 | grad norm: 117830.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4997/ 159576 | consumed samples: 121728 | elapsed time per iteration (ms): 15913.3 | learning rate: 3.369E-05 | global batch size: 48 | lm loss: 6.388143E+00 | loss scale: 8192.0 | grad norm: 90924.050 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4998/ 159576 | consumed samples: 121776 | elapsed time per iteration (ms): 15515.1 | learning rate: 3.371E-05 | global batch size: 48 | lm loss: 6.391698E+00 | loss scale: 8192.0 | grad norm: 103478.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 4999/ 159576 | consumed samples: 121824 | elapsed time per iteration (ms): 15550.7 | learning rate: 3.372E-05 | global batch size: 48 | lm loss: 6.279563E+00 | loss scale: 8192.0 | grad norm: 129145.172 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5000/ 159576 | consumed samples: 121872 | elapsed time per iteration (ms): 15649.4 | learning rate: 3.373E-05 | global batch size: 48 | lm loss: 6.273400E+00 | loss scale: 8192.0 | grad norm: 105673.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------ validation loss at iteration 5000 | lm loss value: 6.285937E+00 | lm loss PPL: 5.369671E+02 | ------------------------------------------------------------------------------------------------ iteration 5001/ 159576 | consumed samples: 121920 | elapsed time per iteration (ms): 23355.2 | learning rate: 3.375E-05 | global batch size: 48 | lm loss: 6.432467E+00 | loss scale: 8192.0 | grad norm: 94517.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5002/ 159576 | consumed samples: 121968 | elapsed time per iteration (ms): 15480.8 | learning rate: 3.376E-05 | global batch size: 48 | lm loss: 6.406679E+00 | loss scale: 8192.0 | grad norm: 93989.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5003/ 159576 | consumed samples: 122016 | elapsed time per iteration (ms): 15462.8 | learning rate: 3.377E-05 | global batch size: 48 | lm loss: 6.425644E+00 | loss scale: 8192.0 | grad norm: 89681.033 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5004/ 159576 | consumed samples: 122064 | elapsed time per iteration (ms): 15981.7 | learning rate: 3.379E-05 | global batch size: 48 | lm loss: 6.492604E+00 | loss scale: 8192.0 | grad norm: 95165.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5005/ 159576 | consumed samples: 122112 | elapsed time per iteration (ms): 15437.2 | learning rate: 3.380E-05 | global batch size: 48 | lm loss: 6.335800E+00 | loss scale: 8192.0 | grad norm: 84441.007 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5006/ 159576 | consumed samples: 122160 | elapsed time per iteration (ms): 15473.9 | learning rate: 3.381E-05 | global batch size: 48 | lm loss: 6.304031E+00 | loss scale: 8192.0 | grad norm: 87318.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5007/ 159576 | consumed samples: 122208 | elapsed time per iteration (ms): 15548.0 | learning rate: 3.383E-05 | global batch size: 48 | lm loss: 6.363890E+00 | loss scale: 8192.0 | grad norm: 92281.656 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5008/ 159576 | consumed samples: 122256 | elapsed time per iteration (ms): 15796.4 | learning rate: 3.384E-05 | global batch size: 48 | lm loss: 6.347075E+00 | loss scale: 8192.0 | grad norm: 103172.108 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5009/ 159576 | consumed samples: 122304 | elapsed time per iteration (ms): 15464.5 | learning rate: 3.385E-05 | global batch size: 48 | lm loss: 6.448061E+00 | loss scale: 8192.0 | grad norm: 95534.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5010/ 159576 | consumed samples: 122352 | elapsed time per iteration (ms): 15447.7 | learning rate: 3.387E-05 | global batch size: 48 | lm loss: 6.328472E+00 | loss scale: 8192.0 | grad norm: 84995.521 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5011/ 159576 | consumed samples: 122400 | elapsed time per iteration (ms): 15420.5 | learning rate: 3.388E-05 | global batch size: 48 | lm loss: 6.340866E+00 | loss scale: 8192.0 | grad norm: 82422.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5012/ 159576 | consumed samples: 122448 | elapsed time per iteration (ms): 15839.2 | learning rate: 3.389E-05 | global batch size: 48 | lm loss: 6.397783E+00 | loss scale: 8192.0 | grad norm: 162057.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5013/ 159576 | consumed samples: 122496 | elapsed time per iteration (ms): 15565.6 | learning rate: 3.391E-05 | global batch size: 48 | lm loss: 6.363326E+00 | loss scale: 8192.0 | grad norm: 86690.137 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5014/ 159576 | consumed samples: 122544 | elapsed time per iteration (ms): 15554.7 | learning rate: 3.392E-05 | global batch size: 48 | lm loss: 6.421363E+00 | loss scale: 8192.0 | grad norm: 102318.730 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5015/ 159576 | consumed samples: 122592 | elapsed time per iteration (ms): 15616.9 | learning rate: 3.393E-05 | global batch size: 48 | lm loss: 6.322345E+00 | loss scale: 8192.0 | grad norm: 83052.732 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5016/ 159576 | consumed samples: 122640 | elapsed time per iteration (ms): 15870.8 | learning rate: 3.395E-05 | global batch size: 48 | lm loss: 6.384270E+00 | loss scale: 8192.0 | grad norm: 167288.542 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5017/ 159576 | consumed samples: 122688 | elapsed time per iteration (ms): 15476.4 | learning rate: 3.396E-05 | global batch size: 48 | lm loss: 6.423479E+00 | loss scale: 8192.0 | grad norm: 86029.728 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5018/ 159576 | consumed samples: 122736 | elapsed time per iteration (ms): 15464.3 | learning rate: 3.397E-05 | global batch size: 48 | lm loss: 6.393809E+00 | loss scale: 8192.0 | grad norm: 123082.971 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5019/ 159576 | consumed samples: 122784 | elapsed time per iteration (ms): 15459.3 | learning rate: 3.399E-05 | global batch size: 48 | lm loss: 6.420121E+00 | loss scale: 8192.0 | grad norm: 82967.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5020/ 159576 | consumed samples: 122832 | elapsed time per iteration (ms): 15660.8 | learning rate: 3.400E-05 | global batch size: 48 | lm loss: 6.436828E+00 | loss scale: 8192.0 | grad norm: 94157.906 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 22:07:41] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 22:07:41] PULSE: tr8-104B is running for 16:15:30 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 5021/ 159576 | consumed samples: 122880 | elapsed time per iteration (ms): 15506.9 | learning rate: 3.401E-05 | global batch size: 48 | lm loss: 6.230031E+00 | loss scale: 8192.0 | grad norm: 93236.907 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5022/ 159576 | consumed samples: 122928 | elapsed time per iteration (ms): 15486.4 | learning rate: 3.403E-05 | global batch size: 48 | lm loss: 6.434629E+00 | loss scale: 8192.0 | grad norm: 88122.737 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5023/ 159576 | consumed samples: 122976 | elapsed time per iteration (ms): 15558.0 | learning rate: 3.404E-05 | global batch size: 48 | lm loss: 6.447264E+00 | loss scale: 8192.0 | grad norm: 99782.616 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5024/ 159576 | consumed samples: 123024 | elapsed time per iteration (ms): 15657.7 | learning rate: 3.405E-05 | global batch size: 48 | lm loss: 6.403034E+00 | loss scale: 8192.0 | grad norm: 102592.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5025/ 159576 | consumed samples: 123072 | elapsed time per iteration (ms): 15429.0 | learning rate: 3.407E-05 | global batch size: 48 | lm loss: 6.433703E+00 | loss scale: 8192.0 | grad norm: 82492.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5026/ 159576 | consumed samples: 123120 | elapsed time per iteration (ms): 15492.8 | learning rate: 3.408E-05 | global batch size: 48 | lm loss: 6.505131E+00 | loss scale: 8192.0 | grad norm: 334700.898 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5027/ 159576 | consumed samples: 123168 | elapsed time per iteration (ms): 15456.4 | learning rate: 3.409E-05 | global batch size: 48 | lm loss: 6.312271E+00 | loss scale: 8192.0 | grad norm: 101204.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5028/ 159576 | consumed samples: 123216 | elapsed time per iteration (ms): 15841.8 | learning rate: 3.411E-05 | global batch size: 48 | lm loss: 6.368502E+00 | loss scale: 8192.0 | grad norm: 103816.078 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5029/ 159576 | consumed samples: 123264 | elapsed time per iteration (ms): 15474.5 | learning rate: 3.412E-05 | global batch size: 48 | lm loss: 6.350607E+00 | loss scale: 8192.0 | grad norm: 88025.860 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5030/ 159576 | consumed samples: 123312 | elapsed time per iteration (ms): 15468.9 | learning rate: 3.413E-05 | global batch size: 48 | lm loss: 6.421462E+00 | loss scale: 8192.0 | grad norm: 121501.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5031/ 159576 | consumed samples: 123360 | elapsed time per iteration (ms): 15894.7 | learning rate: 3.414E-05 | global batch size: 48 | lm loss: 6.452309E+00 | loss scale: 8192.0 | grad norm: 98299.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5032/ 159576 | consumed samples: 123408 | elapsed time per iteration (ms): 15372.6 | learning rate: 3.416E-05 | global batch size: 48 | lm loss: 6.470865E+00 | loss scale: 8192.0 | grad norm: 86033.852 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5033/ 159576 | consumed samples: 123456 | elapsed time per iteration (ms): 15386.4 | learning rate: 3.417E-05 | global batch size: 48 | lm loss: 6.358019E+00 | loss scale: 8192.0 | grad norm: 102254.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5034/ 159576 | consumed samples: 123504 | elapsed time per iteration (ms): 15445.3 | learning rate: 3.418E-05 | global batch size: 48 | lm loss: 6.501051E+00 | loss scale: 8192.0 | grad norm: 106902.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5035/ 159576 | consumed samples: 123552 | elapsed time per iteration (ms): 15687.1 | learning rate: 3.420E-05 | global batch size: 48 | lm loss: 6.441896E+00 | loss scale: 8192.0 | grad norm: 88100.171 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5036/ 159576 | consumed samples: 123600 | elapsed time per iteration (ms): 15548.9 | learning rate: 3.421E-05 | global batch size: 48 | lm loss: 6.297223E+00 | loss scale: 8192.0 | grad norm: 92260.861 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5037/ 159576 | consumed samples: 123648 | elapsed time per iteration (ms): 15475.3 | learning rate: 3.422E-05 | global batch size: 48 | lm loss: 6.382265E+00 | loss scale: 8192.0 | grad norm: 91449.043 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5038/ 159576 | consumed samples: 123696 | elapsed time per iteration (ms): 15468.3 | learning rate: 3.424E-05 | global batch size: 48 | lm loss: 6.354884E+00 | loss scale: 8192.0 | grad norm: 112737.531 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5039/ 159576 | consumed samples: 123744 | elapsed time per iteration (ms): 15758.7 | learning rate: 3.425E-05 | global batch size: 48 | lm loss: 6.504280E+00 | loss scale: 8192.0 | grad norm: 106073.818 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5040/ 159576 | consumed samples: 123792 | elapsed time per iteration (ms): 15421.0 | learning rate: 3.426E-05 | global batch size: 48 | lm loss: 6.361072E+00 | loss scale: 8192.0 | grad norm: 127074.088 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5041/ 159576 | consumed samples: 123840 | elapsed time per iteration (ms): 15385.1 | learning rate: 3.428E-05 | global batch size: 48 | lm loss: 6.289526E+00 | loss scale: 8192.0 | grad norm: 92444.062 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5042/ 159576 | consumed samples: 123888 | elapsed time per iteration (ms): 15433.3 | learning rate: 3.429E-05 | global batch size: 48 | lm loss: 6.276048E+00 | loss scale: 8192.0 | grad norm: 95460.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5043/ 159576 | consumed samples: 123936 | elapsed time per iteration (ms): 15839.0 | learning rate: 3.430E-05 | global batch size: 48 | lm loss: 6.447580E+00 | loss scale: 8192.0 | grad norm: 140216.976 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5044/ 159576 | consumed samples: 123984 | elapsed time per iteration (ms): 15579.5 | learning rate: 3.432E-05 | global batch size: 48 | lm loss: 6.390550E+00 | loss scale: 8192.0 | grad norm: 103110.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5045/ 159576 | consumed samples: 124032 | elapsed time per iteration (ms): 15508.8 | learning rate: 3.433E-05 | global batch size: 48 | lm loss: 6.326768E+00 | loss scale: 8192.0 | grad norm: 143773.143 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5046/ 159576 | consumed samples: 124080 | elapsed time per iteration (ms): 15498.6 | learning rate: 3.434E-05 | global batch size: 48 | lm loss: 6.474419E+00 | loss scale: 8192.0 | grad norm: 112141.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5047/ 159576 | consumed samples: 124128 | elapsed time per iteration (ms): 15657.7 | learning rate: 3.436E-05 | global batch size: 48 | lm loss: 6.411184E+00 | loss scale: 8192.0 | grad norm: 106306.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5048/ 159576 | consumed samples: 124176 | elapsed time per iteration (ms): 15457.2 | learning rate: 3.437E-05 | global batch size: 48 | lm loss: 6.448883E+00 | loss scale: 8192.0 | grad norm: 119234.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5049/ 159576 | consumed samples: 124224 | elapsed time per iteration (ms): 15413.6 | learning rate: 3.438E-05 | global batch size: 48 | lm loss: 6.307952E+00 | loss scale: 8192.0 | grad norm: 94509.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5050/ 159576 | consumed samples: 124272 | elapsed time per iteration (ms): 15423.5 | learning rate: 3.440E-05 | global batch size: 48 | lm loss: 6.399596E+00 | loss scale: 8192.0 | grad norm: 107196.748 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5051/ 159576 | consumed samples: 124320 | elapsed time per iteration (ms): 15555.5 | learning rate: 3.441E-05 | global batch size: 48 | lm loss: 6.345298E+00 | loss scale: 8192.0 | grad norm: 101445.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5052/ 159576 | consumed samples: 124368 | elapsed time per iteration (ms): 15471.9 | learning rate: 3.442E-05 | global batch size: 48 | lm loss: 6.399672E+00 | loss scale: 8192.0 | grad norm: 101071.085 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5053/ 159576 | consumed samples: 124416 | elapsed time per iteration (ms): 15538.7 | learning rate: 3.444E-05 | global batch size: 48 | lm loss: 6.306325E+00 | loss scale: 8192.0 | grad norm: 130980.614 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5054/ 159576 | consumed samples: 124464 | elapsed time per iteration (ms): 15446.5 | learning rate: 3.445E-05 | global batch size: 48 | lm loss: 6.360683E+00 | loss scale: 8192.0 | grad norm: 138731.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5055/ 159576 | consumed samples: 124512 | elapsed time per iteration (ms): 15548.6 | learning rate: 3.446E-05 | global batch size: 48 | lm loss: 6.415308E+00 | loss scale: 8192.0 | grad norm: 172722.048 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5056/ 159576 | consumed samples: 124560 | elapsed time per iteration (ms): 15454.2 | learning rate: 3.448E-05 | global batch size: 48 | lm loss: 6.446492E+00 | loss scale: 8192.0 | grad norm: 114779.854 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5057/ 159576 | consumed samples: 124608 | elapsed time per iteration (ms): 15531.5 | learning rate: 3.449E-05 | global batch size: 48 | lm loss: 6.352797E+00 | loss scale: 8192.0 | grad norm: 93911.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5058/ 159576 | consumed samples: 124656 | elapsed time per iteration (ms): 15916.6 | learning rate: 3.450E-05 | global batch size: 48 | lm loss: 6.394308E+00 | loss scale: 8192.0 | grad norm: 122896.031 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5059/ 159576 | consumed samples: 124704 | elapsed time per iteration (ms): 15639.0 | learning rate: 3.452E-05 | global batch size: 48 | lm loss: 6.497361E+00 | loss scale: 8192.0 | grad norm: 111301.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5060/ 159576 | consumed samples: 124752 | elapsed time per iteration (ms): 15585.9 | learning rate: 3.453E-05 | global batch size: 48 | lm loss: 6.416485E+00 | loss scale: 8192.0 | grad norm: 111209.944 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5061/ 159576 | consumed samples: 124800 | elapsed time per iteration (ms): 15476.2 | learning rate: 3.454E-05 | global batch size: 48 | lm loss: 6.385825E+00 | loss scale: 8192.0 | grad norm: 124134.940 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5062/ 159576 | consumed samples: 124848 | elapsed time per iteration (ms): 15734.0 | learning rate: 3.456E-05 | global batch size: 48 | lm loss: 6.419828E+00 | loss scale: 8192.0 | grad norm: 115134.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5063/ 159576 | consumed samples: 124896 | elapsed time per iteration (ms): 15427.5 | learning rate: 3.457E-05 | global batch size: 48 | lm loss: 6.501984E+00 | loss scale: 8192.0 | grad norm: 94348.909 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5064/ 159576 | consumed samples: 124944 | elapsed time per iteration (ms): 15367.7 | learning rate: 3.458E-05 | global batch size: 48 | lm loss: 6.435040E+00 | loss scale: 8192.0 | grad norm: 107056.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5065/ 159576 | consumed samples: 124992 | elapsed time per iteration (ms): 15376.7 | learning rate: 3.460E-05 | global batch size: 48 | lm loss: 6.347174E+00 | loss scale: 8192.0 | grad norm: 107513.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5066/ 159576 | consumed samples: 125040 | elapsed time per iteration (ms): 15861.2 | learning rate: 3.461E-05 | global batch size: 48 | lm loss: 6.473555E+00 | loss scale: 8192.0 | grad norm: 96134.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5067/ 159576 | consumed samples: 125088 | elapsed time per iteration (ms): 15376.8 | learning rate: 3.462E-05 | global batch size: 48 | lm loss: 6.364458E+00 | loss scale: 8192.0 | grad norm: 110987.016 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5068/ 159576 | consumed samples: 125136 | elapsed time per iteration (ms): 15511.1 | learning rate: 3.464E-05 | global batch size: 48 | lm loss: 6.441058E+00 | loss scale: 8192.0 | grad norm: 135931.030 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5069/ 159576 | consumed samples: 125184 | elapsed time per iteration (ms): 15475.4 | learning rate: 3.465E-05 | global batch size: 48 | lm loss: 6.324648E+00 | loss scale: 8192.0 | grad norm: 108716.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5070/ 159576 | consumed samples: 125232 | elapsed time per iteration (ms): 15862.4 | learning rate: 3.466E-05 | global batch size: 48 | lm loss: 6.318436E+00 | loss scale: 8192.0 | grad norm: 103967.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5071/ 159576 | consumed samples: 125280 | elapsed time per iteration (ms): 15504.6 | learning rate: 3.468E-05 | global batch size: 48 | lm loss: 6.395255E+00 | loss scale: 8192.0 | grad norm: 108399.090 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5072/ 159576 | consumed samples: 125328 | elapsed time per iteration (ms): 15377.1 | learning rate: 3.469E-05 | global batch size: 48 | lm loss: 6.379922E+00 | loss scale: 8192.0 | grad norm: 103462.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5073/ 159576 | consumed samples: 125376 | elapsed time per iteration (ms): 15411.3 | learning rate: 3.470E-05 | global batch size: 48 | lm loss: 6.396028E+00 | loss scale: 8192.0 | grad norm: 95480.077 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5074/ 159576 | consumed samples: 125424 | elapsed time per iteration (ms): 15799.1 | learning rate: 3.472E-05 | global batch size: 48 | lm loss: 6.413391E+00 | loss scale: 8192.0 | grad norm: 150193.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5075/ 159576 | consumed samples: 125472 | elapsed time per iteration (ms): 15638.7 | learning rate: 3.473E-05 | global batch size: 48 | lm loss: 6.308775E+00 | loss scale: 8192.0 | grad norm: 129289.081 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5076/ 159576 | consumed samples: 125520 | elapsed time per iteration (ms): 15490.0 | learning rate: 3.474E-05 | global batch size: 48 | lm loss: 6.273424E+00 | loss scale: 8192.0 | grad norm: 137408.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5077/ 159576 | consumed samples: 125568 | elapsed time per iteration (ms): 15408.8 | learning rate: 3.476E-05 | global batch size: 48 | lm loss: 6.402836E+00 | loss scale: 8192.0 | grad norm: 549435.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5078/ 159576 | consumed samples: 125616 | elapsed time per iteration (ms): 15586.3 | learning rate: 3.477E-05 | global batch size: 48 | lm loss: 6.309762E+00 | loss scale: 8192.0 | grad norm: 104483.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5079/ 159576 | consumed samples: 125664 | elapsed time per iteration (ms): 15542.8 | learning rate: 3.478E-05 | global batch size: 48 | lm loss: 6.315629E+00 | loss scale: 8192.0 | grad norm: 91616.745 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5080/ 159576 | consumed samples: 125712 | elapsed time per iteration (ms): 15472.1 | learning rate: 3.480E-05 | global batch size: 48 | lm loss: 6.554045E+00 | loss scale: 8192.0 | grad norm: 172370.169 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5081/ 159576 | consumed samples: 125760 | elapsed time per iteration (ms): 15563.9 | learning rate: 3.481E-05 | global batch size: 48 | lm loss: 6.355201E+00 | loss scale: 8192.0 | grad norm: 125519.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5082/ 159576 | consumed samples: 125808 | elapsed time per iteration (ms): 15777.1 | learning rate: 3.482E-05 | global batch size: 48 | lm loss: 6.435748E+00 | loss scale: 8192.0 | grad norm: 122698.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5083/ 159576 | consumed samples: 125856 | elapsed time per iteration (ms): 15566.4 | learning rate: 3.484E-05 | global batch size: 48 | lm loss: 6.269705E+00 | loss scale: 8192.0 | grad norm: 120100.832 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5084/ 159576 | consumed samples: 125904 | elapsed time per iteration (ms): 15633.9 | learning rate: 3.485E-05 | global batch size: 48 | lm loss: 6.357334E+00 | loss scale: 8192.0 | grad norm: 98996.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5085/ 159576 | consumed samples: 125952 | elapsed time per iteration (ms): 15985.6 | learning rate: 3.486E-05 | global batch size: 48 | lm loss: 6.393430E+00 | loss scale: 8192.0 | grad norm: 96935.838 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5086/ 159576 | consumed samples: 126000 | elapsed time per iteration (ms): 15483.1 | learning rate: 3.488E-05 | global batch size: 48 | lm loss: 6.307817E+00 | loss scale: 8192.0 | grad norm: 105392.466 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5087/ 159576 | consumed samples: 126048 | elapsed time per iteration (ms): 15492.6 | learning rate: 3.489E-05 | global batch size: 48 | lm loss: 6.307018E+00 | loss scale: 8192.0 | grad norm: 119838.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5088/ 159576 | consumed samples: 126096 | elapsed time per iteration (ms): 15510.3 | learning rate: 3.490E-05 | global batch size: 48 | lm loss: 6.400391E+00 | loss scale: 8192.0 | grad norm: 124265.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5089/ 159576 | consumed samples: 126144 | elapsed time per iteration (ms): 15885.9 | learning rate: 3.492E-05 | global batch size: 48 | lm loss: 6.333194E+00 | loss scale: 8192.0 | grad norm: 115702.613 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5090/ 159576 | consumed samples: 126192 | elapsed time per iteration (ms): 15544.2 | learning rate: 3.493E-05 | global batch size: 48 | lm loss: 6.331620E+00 | loss scale: 8192.0 | grad norm: 137239.041 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5091/ 159576 | consumed samples: 126240 | elapsed time per iteration (ms): 15557.8 | learning rate: 3.494E-05 | global batch size: 48 | lm loss: 6.437903E+00 | loss scale: 8192.0 | grad norm: 233688.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5092/ 159576 | consumed samples: 126288 | elapsed time per iteration (ms): 15511.8 | learning rate: 3.496E-05 | global batch size: 48 | lm loss: 6.421580E+00 | loss scale: 8192.0 | grad norm: 127898.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5093/ 159576 | consumed samples: 126336 | elapsed time per iteration (ms): 16146.9 | learning rate: 3.497E-05 | global batch size: 48 | lm loss: 6.348750E+00 | loss scale: 8192.0 | grad norm: 200287.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5094/ 159576 | consumed samples: 126384 | elapsed time per iteration (ms): 15650.7 | learning rate: 3.498E-05 | global batch size: 48 | lm loss: 6.384042E+00 | loss scale: 8192.0 | grad norm: 141808.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5095/ 159576 | consumed samples: 126432 | elapsed time per iteration (ms): 15549.8 | learning rate: 3.500E-05 | global batch size: 48 | lm loss: 6.380728E+00 | loss scale: 8192.0 | grad norm: 113750.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5096/ 159576 | consumed samples: 126480 | elapsed time per iteration (ms): 15494.8 | learning rate: 3.501E-05 | global batch size: 48 | lm loss: 6.329007E+00 | loss scale: 8192.0 | grad norm: 142607.603 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5097/ 159576 | consumed samples: 126528 | elapsed time per iteration (ms): 15805.4 | learning rate: 3.502E-05 | global batch size: 48 | lm loss: 6.331810E+00 | loss scale: 8192.0 | grad norm: 125989.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5098/ 159576 | consumed samples: 126576 | elapsed time per iteration (ms): 15560.8 | learning rate: 3.504E-05 | global batch size: 48 | lm loss: 6.349818E+00 | loss scale: 8192.0 | grad norm: 164955.758 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5099/ 159576 | consumed samples: 126624 | elapsed time per iteration (ms): 15574.8 | learning rate: 3.505E-05 | global batch size: 48 | lm loss: 6.511029E+00 | loss scale: 8192.0 | grad norm: 150219.938 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5100/ 159576 | consumed samples: 126672 | elapsed time per iteration (ms): 15588.9 | learning rate: 3.506E-05 | global batch size: 48 | lm loss: 6.365673E+00 | loss scale: 8192.0 | grad norm: 132801.144 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5101/ 159576 | consumed samples: 126720 | elapsed time per iteration (ms): 15620.0 | learning rate: 3.508E-05 | global batch size: 48 | lm loss: 6.393438E+00 | loss scale: 8192.0 | grad norm: 181251.963 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5102/ 159576 | consumed samples: 126768 | elapsed time per iteration (ms): 15489.4 | learning rate: 3.509E-05 | global batch size: 48 | lm loss: 6.416411E+00 | loss scale: 8192.0 | grad norm: 117102.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5103/ 159576 | consumed samples: 126816 | elapsed time per iteration (ms): 15557.2 | learning rate: 3.510E-05 | global batch size: 48 | lm loss: 6.328413E+00 | loss scale: 8192.0 | grad norm: 187671.141 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5104/ 159576 | consumed samples: 126864 | elapsed time per iteration (ms): 15527.6 | learning rate: 3.512E-05 | global batch size: 48 | lm loss: 6.465903E+00 | loss scale: 8192.0 | grad norm: 190613.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5105/ 159576 | consumed samples: 126912 | elapsed time per iteration (ms): 8977.0 | learning rate: 3.512E-05 | global batch size: 48 | lm loss: 6.508333E+00 | loss scale: 4096.0 | grad norm: 190613.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5106/ 159576 | consumed samples: 126960 | elapsed time per iteration (ms): 15010.8 | learning rate: 3.513E-05 | global batch size: 48 | lm loss: 6.436017E+00 | loss scale: 4096.0 | grad norm: 59199.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5107/ 159576 | consumed samples: 127008 | elapsed time per iteration (ms): 15527.1 | learning rate: 3.514E-05 | global batch size: 48 | lm loss: 6.357530E+00 | loss scale: 4096.0 | grad norm: 72710.163 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5108/ 159576 | consumed samples: 127056 | elapsed time per iteration (ms): 15496.3 | learning rate: 3.516E-05 | global batch size: 48 | lm loss: 6.394055E+00 | loss scale: 4096.0 | grad norm: 94748.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5109/ 159576 | consumed samples: 127104 | elapsed time per iteration (ms): 15957.2 | learning rate: 3.517E-05 | global batch size: 48 | lm loss: 6.443262E+00 | loss scale: 4096.0 | grad norm: 61224.800 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5110/ 159576 | consumed samples: 127152 | elapsed time per iteration (ms): 15587.8 | learning rate: 3.518E-05 | global batch size: 48 | lm loss: 6.400789E+00 | loss scale: 4096.0 | grad norm: 97179.001 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5111/ 159576 | consumed samples: 127200 | elapsed time per iteration (ms): 15522.6 | learning rate: 3.520E-05 | global batch size: 48 | lm loss: 6.368151E+00 | loss scale: 4096.0 | grad norm: 103211.934 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5112/ 159576 | consumed samples: 127248 | elapsed time per iteration (ms): 15555.5 | learning rate: 3.521E-05 | global batch size: 48 | lm loss: 6.389073E+00 | loss scale: 4096.0 | grad norm: 68143.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5113/ 159576 | consumed samples: 127296 | elapsed time per iteration (ms): 15672.8 | learning rate: 3.522E-05 | global batch size: 48 | lm loss: 6.453850E+00 | loss scale: 4096.0 | grad norm: 80102.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5114/ 159576 | consumed samples: 127344 | elapsed time per iteration (ms): 15462.8 | learning rate: 3.524E-05 | global batch size: 48 | lm loss: 6.448624E+00 | loss scale: 4096.0 | grad norm: 79184.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5115/ 159576 | consumed samples: 127392 | elapsed time per iteration (ms): 15488.2 | learning rate: 3.525E-05 | global batch size: 48 | lm loss: 6.440034E+00 | loss scale: 4096.0 | grad norm: 65278.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5116/ 159576 | consumed samples: 127440 | elapsed time per iteration (ms): 15517.5 | learning rate: 3.526E-05 | global batch size: 48 | lm loss: 6.452240E+00 | loss scale: 4096.0 | grad norm: 81154.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5117/ 159576 | consumed samples: 127488 | elapsed time per iteration (ms): 15650.3 | learning rate: 3.528E-05 | global batch size: 48 | lm loss: 6.352810E+00 | loss scale: 4096.0 | grad norm: 70667.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5118/ 159576 | consumed samples: 127536 | elapsed time per iteration (ms): 15553.2 | learning rate: 3.529E-05 | global batch size: 48 | lm loss: 6.422338E+00 | loss scale: 4096.0 | grad norm: 76003.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5119/ 159576 | consumed samples: 127584 | elapsed time per iteration (ms): 15525.1 | learning rate: 3.530E-05 | global batch size: 48 | lm loss: 6.345719E+00 | loss scale: 4096.0 | grad norm: 75153.995 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5120/ 159576 | consumed samples: 127632 | elapsed time per iteration (ms): 15941.5 | learning rate: 3.532E-05 | global batch size: 48 | lm loss: 6.406080E+00 | loss scale: 4096.0 | grad norm: 61393.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5121/ 159576 | consumed samples: 127680 | elapsed time per iteration (ms): 15581.4 | learning rate: 3.533E-05 | global batch size: 48 | lm loss: 6.333064E+00 | loss scale: 4096.0 | grad norm: 84273.861 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5122/ 159576 | consumed samples: 127728 | elapsed time per iteration (ms): 15534.4 | learning rate: 3.534E-05 | global batch size: 48 | lm loss: 6.430450E+00 | loss scale: 4096.0 | grad norm: 71025.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5123/ 159576 | consumed samples: 127776 | elapsed time per iteration (ms): 15491.5 | learning rate: 3.536E-05 | global batch size: 48 | lm loss: 6.372457E+00 | loss scale: 4096.0 | grad norm: 60958.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5124/ 159576 | consumed samples: 127824 | elapsed time per iteration (ms): 15825.8 | learning rate: 3.537E-05 | global batch size: 48 | lm loss: 6.359689E+00 | loss scale: 4096.0 | grad norm: 69184.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5125/ 159576 | consumed samples: 127872 | elapsed time per iteration (ms): 15572.0 | learning rate: 3.538E-05 | global batch size: 48 | lm loss: 6.354432E+00 | loss scale: 4096.0 | grad norm: 81726.924 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5126/ 159576 | consumed samples: 127920 | elapsed time per iteration (ms): 15546.1 | learning rate: 3.540E-05 | global batch size: 48 | lm loss: 6.383263E+00 | loss scale: 4096.0 | grad norm: 67932.048 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5127/ 159576 | consumed samples: 127968 | elapsed time per iteration (ms): 15512.5 | learning rate: 3.541E-05 | global batch size: 48 | lm loss: 6.323973E+00 | loss scale: 4096.0 | grad norm: 69551.089 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5128/ 159576 | consumed samples: 128016 | elapsed time per iteration (ms): 15872.2 | learning rate: 3.542E-05 | global batch size: 48 | lm loss: 6.384116E+00 | loss scale: 4096.0 | grad norm: 66160.808 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5129/ 159576 | consumed samples: 128064 | elapsed time per iteration (ms): 15540.5 | learning rate: 3.544E-05 | global batch size: 48 | lm loss: 6.273410E+00 | loss scale: 4096.0 | grad norm: 68712.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5130/ 159576 | consumed samples: 128112 | elapsed time per iteration (ms): 15510.9 | learning rate: 3.545E-05 | global batch size: 48 | lm loss: 6.393827E+00 | loss scale: 4096.0 | grad norm: 80347.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5131/ 159576 | consumed samples: 128160 | elapsed time per iteration (ms): 15546.9 | learning rate: 3.546E-05 | global batch size: 48 | lm loss: 6.494912E+00 | loss scale: 4096.0 | grad norm: 79601.794 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5132/ 159576 | consumed samples: 128208 | elapsed time per iteration (ms): 15850.8 | learning rate: 3.548E-05 | global batch size: 48 | lm loss: 6.363180E+00 | loss scale: 4096.0 | grad norm: 59957.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5133/ 159576 | consumed samples: 128256 | elapsed time per iteration (ms): 15572.0 | learning rate: 3.549E-05 | global batch size: 48 | lm loss: 6.361386E+00 | loss scale: 4096.0 | grad norm: 65589.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5134/ 159576 | consumed samples: 128304 | elapsed time per iteration (ms): 15554.8 | learning rate: 3.550E-05 | global batch size: 48 | lm loss: 6.338229E+00 | loss scale: 4096.0 | grad norm: 70953.865 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5135/ 159576 | consumed samples: 128352 | elapsed time per iteration (ms): 15508.1 | learning rate: 3.552E-05 | global batch size: 48 | lm loss: 6.265258E+00 | loss scale: 4096.0 | grad norm: 101476.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5136/ 159576 | consumed samples: 128400 | elapsed time per iteration (ms): 15713.9 | learning rate: 3.553E-05 | global batch size: 48 | lm loss: 6.443205E+00 | loss scale: 4096.0 | grad norm: 70676.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5137/ 159576 | consumed samples: 128448 | elapsed time per iteration (ms): 15500.3 | learning rate: 3.554E-05 | global batch size: 48 | lm loss: 6.297948E+00 | loss scale: 4096.0 | grad norm: 50734.773 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5138/ 159576 | consumed samples: 128496 | elapsed time per iteration (ms): 15505.3 | learning rate: 3.556E-05 | global batch size: 48 | lm loss: 6.343609E+00 | loss scale: 4096.0 | grad norm: 67207.942 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5139/ 159576 | consumed samples: 128544 | elapsed time per iteration (ms): 15531.1 | learning rate: 3.557E-05 | global batch size: 48 | lm loss: 6.422406E+00 | loss scale: 4096.0 | grad norm: 50444.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5140/ 159576 | consumed samples: 128592 | elapsed time per iteration (ms): 15679.9 | learning rate: 3.558E-05 | global batch size: 48 | lm loss: 6.377341E+00 | loss scale: 4096.0 | grad norm: 71866.018 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5141/ 159576 | consumed samples: 128640 | elapsed time per iteration (ms): 15549.3 | learning rate: 3.560E-05 | global batch size: 48 | lm loss: 6.403359E+00 | loss scale: 4096.0 | grad norm: 64942.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5142/ 159576 | consumed samples: 128688 | elapsed time per iteration (ms): 15525.2 | learning rate: 3.561E-05 | global batch size: 48 | lm loss: 6.390831E+00 | loss scale: 4096.0 | grad norm: 66674.644 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5143/ 159576 | consumed samples: 128736 | elapsed time per iteration (ms): 15540.8 | learning rate: 3.562E-05 | global batch size: 48 | lm loss: 6.391725E+00 | loss scale: 4096.0 | grad norm: 59980.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5144/ 159576 | consumed samples: 128784 | elapsed time per iteration (ms): 15885.0 | learning rate: 3.564E-05 | global batch size: 48 | lm loss: 6.459509E+00 | loss scale: 4096.0 | grad norm: 136366.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5145/ 159576 | consumed samples: 128832 | elapsed time per iteration (ms): 15452.0 | learning rate: 3.565E-05 | global batch size: 48 | lm loss: 6.528796E+00 | loss scale: 4096.0 | grad norm: 82183.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5146/ 159576 | consumed samples: 128880 | elapsed time per iteration (ms): 15509.1 | learning rate: 3.566E-05 | global batch size: 48 | lm loss: 6.420625E+00 | loss scale: 4096.0 | grad norm: 69812.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5147/ 159576 | consumed samples: 128928 | elapsed time per iteration (ms): 15918.9 | learning rate: 3.568E-05 | global batch size: 48 | lm loss: 6.436305E+00 | loss scale: 4096.0 | grad norm: 63955.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5148/ 159576 | consumed samples: 128976 | elapsed time per iteration (ms): 15526.4 | learning rate: 3.569E-05 | global batch size: 48 | lm loss: 6.339918E+00 | loss scale: 4096.0 | grad norm: 56857.758 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5149/ 159576 | consumed samples: 129024 | elapsed time per iteration (ms): 15529.0 | learning rate: 3.570E-05 | global batch size: 48 | lm loss: 6.345021E+00 | loss scale: 4096.0 | grad norm: 93115.718 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5150/ 159576 | consumed samples: 129072 | elapsed time per iteration (ms): 15542.6 | learning rate: 3.572E-05 | global batch size: 48 | lm loss: 6.311335E+00 | loss scale: 4096.0 | grad norm: 61629.742 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5151/ 159576 | consumed samples: 129120 | elapsed time per iteration (ms): 15904.0 | learning rate: 3.573E-05 | global batch size: 48 | lm loss: 6.397278E+00 | loss scale: 4096.0 | grad norm: 65208.827 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5152/ 159576 | consumed samples: 129168 | elapsed time per iteration (ms): 15450.1 | learning rate: 3.574E-05 | global batch size: 48 | lm loss: 6.345972E+00 | loss scale: 4096.0 | grad norm: 72003.182 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5153/ 159576 | consumed samples: 129216 | elapsed time per iteration (ms): 15533.3 | learning rate: 3.576E-05 | global batch size: 48 | lm loss: 6.411428E+00 | loss scale: 4096.0 | grad norm: 105237.969 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5154/ 159576 | consumed samples: 129264 | elapsed time per iteration (ms): 15505.2 | learning rate: 3.577E-05 | global batch size: 48 | lm loss: 6.320354E+00 | loss scale: 4096.0 | grad norm: 101458.750 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5155/ 159576 | consumed samples: 129312 | elapsed time per iteration (ms): 15994.4 | learning rate: 3.578E-05 | global batch size: 48 | lm loss: 6.453386E+00 | loss scale: 4096.0 | grad norm: 118215.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5156/ 159576 | consumed samples: 129360 | elapsed time per iteration (ms): 15565.8 | learning rate: 3.580E-05 | global batch size: 48 | lm loss: 6.443649E+00 | loss scale: 4096.0 | grad norm: 72691.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5157/ 159576 | consumed samples: 129408 | elapsed time per iteration (ms): 15539.2 | learning rate: 3.581E-05 | global batch size: 48 | lm loss: 6.528984E+00 | loss scale: 4096.0 | grad norm: 72165.791 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5158/ 159576 | consumed samples: 129456 | elapsed time per iteration (ms): 15536.3 | learning rate: 3.582E-05 | global batch size: 48 | lm loss: 6.398818E+00 | loss scale: 4096.0 | grad norm: 69046.921 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5159/ 159576 | consumed samples: 129504 | elapsed time per iteration (ms): 15739.5 | learning rate: 3.584E-05 | global batch size: 48 | lm loss: 6.384636E+00 | loss scale: 4096.0 | grad norm: 65721.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5160/ 159576 | consumed samples: 129552 | elapsed time per iteration (ms): 15530.3 | learning rate: 3.585E-05 | global batch size: 48 | lm loss: 6.340583E+00 | loss scale: 4096.0 | grad norm: 70984.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5161/ 159576 | consumed samples: 129600 | elapsed time per iteration (ms): 15537.1 | learning rate: 3.586E-05 | global batch size: 48 | lm loss: 6.299366E+00 | loss scale: 4096.0 | grad norm: 120531.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5162/ 159576 | consumed samples: 129648 | elapsed time per iteration (ms): 15525.1 | learning rate: 3.588E-05 | global batch size: 48 | lm loss: 6.422726E+00 | loss scale: 4096.0 | grad norm: 80943.603 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5163/ 159576 | consumed samples: 129696 | elapsed time per iteration (ms): 15737.7 | learning rate: 3.589E-05 | global batch size: 48 | lm loss: 6.343781E+00 | loss scale: 4096.0 | grad norm: 62800.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5164/ 159576 | consumed samples: 129744 | elapsed time per iteration (ms): 15570.2 | learning rate: 3.590E-05 | global batch size: 48 | lm loss: 6.478961E+00 | loss scale: 4096.0 | grad norm: 49279.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5165/ 159576 | consumed samples: 129792 | elapsed time per iteration (ms): 15467.9 | learning rate: 3.592E-05 | global batch size: 48 | lm loss: 6.465704E+00 | loss scale: 4096.0 | grad norm: 56608.697 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5166/ 159576 | consumed samples: 129840 | elapsed time per iteration (ms): 15511.0 | learning rate: 3.593E-05 | global batch size: 48 | lm loss: 6.389446E+00 | loss scale: 4096.0 | grad norm: 64287.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5167/ 159576 | consumed samples: 129888 | elapsed time per iteration (ms): 15650.0 | learning rate: 3.594E-05 | global batch size: 48 | lm loss: 6.432152E+00 | loss scale: 4096.0 | grad norm: 68389.100 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5168/ 159576 | consumed samples: 129936 | elapsed time per iteration (ms): 15501.5 | learning rate: 3.596E-05 | global batch size: 48 | lm loss: 6.311705E+00 | loss scale: 4096.0 | grad norm: 60127.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5169/ 159576 | consumed samples: 129984 | elapsed time per iteration (ms): 15500.0 | learning rate: 3.597E-05 | global batch size: 48 | lm loss: 6.459386E+00 | loss scale: 4096.0 | grad norm: 193850.992 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5170/ 159576 | consumed samples: 130032 | elapsed time per iteration (ms): 15853.5 | learning rate: 3.598E-05 | global batch size: 48 | lm loss: 6.359794E+00 | loss scale: 4096.0 | grad norm: 201400.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5171/ 159576 | consumed samples: 130080 | elapsed time per iteration (ms): 15565.6 | learning rate: 3.600E-05 | global batch size: 48 | lm loss: 6.447841E+00 | loss scale: 4096.0 | grad norm: 60758.011 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5172/ 159576 | consumed samples: 130128 | elapsed time per iteration (ms): 15439.0 | learning rate: 3.601E-05 | global batch size: 48 | lm loss: 6.390144E+00 | loss scale: 4096.0 | grad norm: 60173.953 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5173/ 159576 | consumed samples: 130176 | elapsed time per iteration (ms): 15512.4 | learning rate: 3.602E-05 | global batch size: 48 | lm loss: 6.471553E+00 | loss scale: 4096.0 | grad norm: 65209.828 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5174/ 159576 | consumed samples: 130224 | elapsed time per iteration (ms): 15753.1 | learning rate: 3.604E-05 | global batch size: 48 | lm loss: 6.363354E+00 | loss scale: 4096.0 | grad norm: 66471.065 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5175/ 159576 | consumed samples: 130272 | elapsed time per iteration (ms): 15415.5 | learning rate: 3.605E-05 | global batch size: 48 | lm loss: 6.418964E+00 | loss scale: 4096.0 | grad norm: 63654.751 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5176/ 159576 | consumed samples: 130320 | elapsed time per iteration (ms): 15469.1 | learning rate: 3.606E-05 | global batch size: 48 | lm loss: 6.357801E+00 | loss scale: 4096.0 | grad norm: 82288.957 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5177/ 159576 | consumed samples: 130368 | elapsed time per iteration (ms): 15407.1 | learning rate: 3.608E-05 | global batch size: 48 | lm loss: 6.479723E+00 | loss scale: 4096.0 | grad norm: 63508.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5178/ 159576 | consumed samples: 130416 | elapsed time per iteration (ms): 15785.1 | learning rate: 3.609E-05 | global batch size: 48 | lm loss: 6.532706E+00 | loss scale: 4096.0 | grad norm: 62734.072 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5179/ 159576 | consumed samples: 130464 | elapsed time per iteration (ms): 15467.8 | learning rate: 3.610E-05 | global batch size: 48 | lm loss: 6.442670E+00 | loss scale: 4096.0 | grad norm: 64963.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5180/ 159576 | consumed samples: 130512 | elapsed time per iteration (ms): 15479.5 | learning rate: 3.612E-05 | global batch size: 48 | lm loss: 6.373410E+00 | loss scale: 4096.0 | grad norm: 62492.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5181/ 159576 | consumed samples: 130560 | elapsed time per iteration (ms): 15413.5 | learning rate: 3.613E-05 | global batch size: 48 | lm loss: 6.442731E+00 | loss scale: 4096.0 | grad norm: 93654.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5182/ 159576 | consumed samples: 130608 | elapsed time per iteration (ms): 15788.0 | learning rate: 3.614E-05 | global batch size: 48 | lm loss: 6.356236E+00 | loss scale: 4096.0 | grad norm: 77133.068 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5183/ 159576 | consumed samples: 130656 | elapsed time per iteration (ms): 15436.5 | learning rate: 3.616E-05 | global batch size: 48 | lm loss: 6.321268E+00 | loss scale: 4096.0 | grad norm: 138010.507 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5184/ 159576 | consumed samples: 130704 | elapsed time per iteration (ms): 15417.0 | learning rate: 3.617E-05 | global batch size: 48 | lm loss: 6.463357E+00 | loss scale: 4096.0 | grad norm: 67977.572 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5185/ 159576 | consumed samples: 130752 | elapsed time per iteration (ms): 15399.1 | learning rate: 3.618E-05 | global batch size: 48 | lm loss: 6.369720E+00 | loss scale: 4096.0 | grad norm: 73939.997 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5186/ 159576 | consumed samples: 130800 | elapsed time per iteration (ms): 15682.4 | learning rate: 3.620E-05 | global batch size: 48 | lm loss: 6.404753E+00 | loss scale: 4096.0 | grad norm: 71441.970 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5187/ 159576 | consumed samples: 130848 | elapsed time per iteration (ms): 15500.0 | learning rate: 3.621E-05 | global batch size: 48 | lm loss: 6.418368E+00 | loss scale: 4096.0 | grad norm: 85130.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5188/ 159576 | consumed samples: 130896 | elapsed time per iteration (ms): 15437.0 | learning rate: 3.622E-05 | global batch size: 48 | lm loss: 6.391647E+00 | loss scale: 4096.0 | grad norm: 66283.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5189/ 159576 | consumed samples: 130944 | elapsed time per iteration (ms): 15475.7 | learning rate: 3.624E-05 | global batch size: 48 | lm loss: 6.322616E+00 | loss scale: 4096.0 | grad norm: 75047.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5190/ 159576 | consumed samples: 130992 | elapsed time per iteration (ms): 15579.8 | learning rate: 3.625E-05 | global batch size: 48 | lm loss: 6.431418E+00 | loss scale: 4096.0 | grad norm: 58908.817 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5191/ 159576 | consumed samples: 131040 | elapsed time per iteration (ms): 15429.7 | learning rate: 3.626E-05 | global batch size: 48 | lm loss: 6.535919E+00 | loss scale: 4096.0 | grad norm: 122859.857 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5192/ 159576 | consumed samples: 131088 | elapsed time per iteration (ms): 15437.2 | learning rate: 3.628E-05 | global batch size: 48 | lm loss: 6.220134E+00 | loss scale: 4096.0 | grad norm: 92437.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5193/ 159576 | consumed samples: 131136 | elapsed time per iteration (ms): 15429.8 | learning rate: 3.629E-05 | global batch size: 48 | lm loss: 6.373948E+00 | loss scale: 4096.0 | grad norm: 93116.737 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5194/ 159576 | consumed samples: 131184 | elapsed time per iteration (ms): 15588.8 | learning rate: 3.630E-05 | global batch size: 48 | lm loss: 6.390661E+00 | loss scale: 4096.0 | grad norm: 64520.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5195/ 159576 | consumed samples: 131232 | elapsed time per iteration (ms): 15414.6 | learning rate: 3.632E-05 | global batch size: 48 | lm loss: 6.359470E+00 | loss scale: 4096.0 | grad norm: 61039.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5196/ 159576 | consumed samples: 131280 | elapsed time per iteration (ms): 15469.0 | learning rate: 3.633E-05 | global batch size: 48 | lm loss: 6.426967E+00 | loss scale: 4096.0 | grad norm: 69860.175 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5197/ 159576 | consumed samples: 131328 | elapsed time per iteration (ms): 15399.3 | learning rate: 3.634E-05 | global batch size: 48 | lm loss: 6.397369E+00 | loss scale: 4096.0 | grad norm: 67025.925 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5198/ 159576 | consumed samples: 131376 | elapsed time per iteration (ms): 15852.9 | learning rate: 3.636E-05 | global batch size: 48 | lm loss: 6.470811E+00 | loss scale: 4096.0 | grad norm: 94172.614 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5199/ 159576 | consumed samples: 131424 | elapsed time per iteration (ms): 15428.8 | learning rate: 3.637E-05 | global batch size: 48 | lm loss: 6.341267E+00 | loss scale: 4096.0 | grad norm: 73918.814 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5200/ 159576 | consumed samples: 131472 | elapsed time per iteration (ms): 15444.1 | learning rate: 3.638E-05 | global batch size: 48 | lm loss: 6.434019E+00 | loss scale: 4096.0 | grad norm: 107373.139 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5201/ 159576 | consumed samples: 131520 | elapsed time per iteration (ms): 15807.8 | learning rate: 3.639E-05 | global batch size: 48 | lm loss: 6.288959E+00 | loss scale: 4096.0 | grad norm: 60538.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5202/ 159576 | consumed samples: 131568 | elapsed time per iteration (ms): 15428.1 | learning rate: 3.641E-05 | global batch size: 48 | lm loss: 6.382991E+00 | loss scale: 4096.0 | grad norm: 87744.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5203/ 159576 | consumed samples: 131616 | elapsed time per iteration (ms): 15473.7 | learning rate: 3.642E-05 | global batch size: 48 | lm loss: 6.421006E+00 | loss scale: 4096.0 | grad norm: 63743.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5204/ 159576 | consumed samples: 131664 | elapsed time per iteration (ms): 15342.5 | learning rate: 3.643E-05 | global batch size: 48 | lm loss: 6.345580E+00 | loss scale: 4096.0 | grad norm: 83317.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5205/ 159576 | consumed samples: 131712 | elapsed time per iteration (ms): 15751.6 | learning rate: 3.645E-05 | global batch size: 48 | lm loss: 6.379266E+00 | loss scale: 4096.0 | grad norm: 72285.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5206/ 159576 | consumed samples: 131760 | elapsed time per iteration (ms): 15391.2 | learning rate: 3.646E-05 | global batch size: 48 | lm loss: 6.296494E+00 | loss scale: 4096.0 | grad norm: 99774.130 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5207/ 159576 | consumed samples: 131808 | elapsed time per iteration (ms): 15463.8 | learning rate: 3.647E-05 | global batch size: 48 | lm loss: 6.419320E+00 | loss scale: 4096.0 | grad norm: 76787.605 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5208/ 159576 | consumed samples: 131856 | elapsed time per iteration (ms): 15457.9 | learning rate: 3.649E-05 | global batch size: 48 | lm loss: 6.321754E+00 | loss scale: 4096.0 | grad norm: 71044.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5209/ 159576 | consumed samples: 131904 | elapsed time per iteration (ms): 15812.3 | learning rate: 3.650E-05 | global batch size: 48 | lm loss: 6.295812E+00 | loss scale: 4096.0 | grad norm: 80278.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5210/ 159576 | consumed samples: 131952 | elapsed time per iteration (ms): 15416.3 | learning rate: 3.651E-05 | global batch size: 48 | lm loss: 6.444015E+00 | loss scale: 4096.0 | grad norm: 69086.077 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5211/ 159576 | consumed samples: 132000 | elapsed time per iteration (ms): 15496.5 | learning rate: 3.653E-05 | global batch size: 48 | lm loss: 6.426943E+00 | loss scale: 4096.0 | grad norm: 87922.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5212/ 159576 | consumed samples: 132048 | elapsed time per iteration (ms): 15327.0 | learning rate: 3.654E-05 | global batch size: 48 | lm loss: 6.361041E+00 | loss scale: 4096.0 | grad norm: 68686.112 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5213/ 159576 | consumed samples: 132096 | elapsed time per iteration (ms): 15936.5 | learning rate: 3.655E-05 | global batch size: 48 | lm loss: 6.389860E+00 | loss scale: 4096.0 | grad norm: 68529.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5214/ 159576 | consumed samples: 132144 | elapsed time per iteration (ms): 15542.2 | learning rate: 3.657E-05 | global batch size: 48 | lm loss: 6.395509E+00 | loss scale: 4096.0 | grad norm: 66332.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5215/ 159576 | consumed samples: 132192 | elapsed time per iteration (ms): 15481.3 | learning rate: 3.658E-05 | global batch size: 48 | lm loss: 6.378184E+00 | loss scale: 4096.0 | grad norm: 69005.077 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5216/ 159576 | consumed samples: 132240 | elapsed time per iteration (ms): 15471.0 | learning rate: 3.659E-05 | global batch size: 48 | lm loss: 6.409903E+00 | loss scale: 4096.0 | grad norm: 78238.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5217/ 159576 | consumed samples: 132288 | elapsed time per iteration (ms): 15765.5 | learning rate: 3.661E-05 | global batch size: 48 | lm loss: 6.468248E+00 | loss scale: 4096.0 | grad norm: 81260.175 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5218/ 159576 | consumed samples: 132336 | elapsed time per iteration (ms): 15514.7 | learning rate: 3.662E-05 | global batch size: 48 | lm loss: 6.462075E+00 | loss scale: 4096.0 | grad norm: 89591.763 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5219/ 159576 | consumed samples: 132384 | elapsed time per iteration (ms): 15488.0 | learning rate: 3.663E-05 | global batch size: 48 | lm loss: 6.402821E+00 | loss scale: 4096.0 | grad norm: 67243.019 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5220/ 159576 | consumed samples: 132432 | elapsed time per iteration (ms): 15443.2 | learning rate: 3.665E-05 | global batch size: 48 | lm loss: 6.377299E+00 | loss scale: 4096.0 | grad norm: 73909.640 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5221/ 159576 | consumed samples: 132480 | elapsed time per iteration (ms): 15695.0 | learning rate: 3.666E-05 | global batch size: 48 | lm loss: 6.451472E+00 | loss scale: 4096.0 | grad norm: 66658.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5222/ 159576 | consumed samples: 132528 | elapsed time per iteration (ms): 15480.5 | learning rate: 3.667E-05 | global batch size: 48 | lm loss: 6.465474E+00 | loss scale: 4096.0 | grad norm: 71303.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5223/ 159576 | consumed samples: 132576 | elapsed time per iteration (ms): 15538.4 | learning rate: 3.669E-05 | global batch size: 48 | lm loss: 6.452018E+00 | loss scale: 4096.0 | grad norm: 61632.620 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5224/ 159576 | consumed samples: 132624 | elapsed time per iteration (ms): 15433.6 | learning rate: 3.670E-05 | global batch size: 48 | lm loss: 6.417565E+00 | loss scale: 4096.0 | grad norm: 99052.706 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5225/ 159576 | consumed samples: 132672 | elapsed time per iteration (ms): 16019.0 | learning rate: 3.671E-05 | global batch size: 48 | lm loss: 6.392467E+00 | loss scale: 4096.0 | grad norm: 81901.168 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5226/ 159576 | consumed samples: 132720 | elapsed time per iteration (ms): 15479.0 | learning rate: 3.673E-05 | global batch size: 48 | lm loss: 6.432102E+00 | loss scale: 4096.0 | grad norm: 80603.914 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5227/ 159576 | consumed samples: 132768 | elapsed time per iteration (ms): 15499.4 | learning rate: 3.674E-05 | global batch size: 48 | lm loss: 6.304895E+00 | loss scale: 4096.0 | grad norm: 63916.075 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5228/ 159576 | consumed samples: 132816 | elapsed time per iteration (ms): 15774.2 | learning rate: 3.675E-05 | global batch size: 48 | lm loss: 6.323613E+00 | loss scale: 4096.0 | grad norm: 76694.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5229/ 159576 | consumed samples: 132864 | elapsed time per iteration (ms): 15599.1 | learning rate: 3.677E-05 | global batch size: 48 | lm loss: 6.488564E+00 | loss scale: 4096.0 | grad norm: 76280.931 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5230/ 159576 | consumed samples: 132912 | elapsed time per iteration (ms): 15549.2 | learning rate: 3.678E-05 | global batch size: 48 | lm loss: 6.430355E+00 | loss scale: 4096.0 | grad norm: 71462.889 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5231/ 159576 | consumed samples: 132960 | elapsed time per iteration (ms): 15501.3 | learning rate: 3.679E-05 | global batch size: 48 | lm loss: 6.493622E+00 | loss scale: 4096.0 | grad norm: 59853.872 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5232/ 159576 | consumed samples: 133008 | elapsed time per iteration (ms): 15779.3 | learning rate: 3.681E-05 | global batch size: 48 | lm loss: 6.284019E+00 | loss scale: 4096.0 | grad norm: 69496.678 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5233/ 159576 | consumed samples: 133056 | elapsed time per iteration (ms): 15428.5 | learning rate: 3.682E-05 | global batch size: 48 | lm loss: 6.267179E+00 | loss scale: 4096.0 | grad norm: 63245.018 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5234/ 159576 | consumed samples: 133104 | elapsed time per iteration (ms): 15461.3 | learning rate: 3.683E-05 | global batch size: 48 | lm loss: 6.449612E+00 | loss scale: 4096.0 | grad norm: 78199.189 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5235/ 159576 | consumed samples: 133152 | elapsed time per iteration (ms): 15485.3 | learning rate: 3.685E-05 | global batch size: 48 | lm loss: 6.443536E+00 | loss scale: 4096.0 | grad norm: 70168.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5236/ 159576 | consumed samples: 133200 | elapsed time per iteration (ms): 15933.7 | learning rate: 3.686E-05 | global batch size: 48 | lm loss: 6.244983E+00 | loss scale: 4096.0 | grad norm: 75166.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5237/ 159576 | consumed samples: 133248 | elapsed time per iteration (ms): 15418.0 | learning rate: 3.687E-05 | global batch size: 48 | lm loss: 6.283341E+00 | loss scale: 4096.0 | grad norm: 72463.714 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5238/ 159576 | consumed samples: 133296 | elapsed time per iteration (ms): 15549.2 | learning rate: 3.689E-05 | global batch size: 48 | lm loss: 6.438685E+00 | loss scale: 4096.0 | grad norm: 82352.679 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5239/ 159576 | consumed samples: 133344 | elapsed time per iteration (ms): 15537.2 | learning rate: 3.690E-05 | global batch size: 48 | lm loss: 6.362652E+00 | loss scale: 4096.0 | grad norm: 70918.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5240/ 159576 | consumed samples: 133392 | elapsed time per iteration (ms): 15840.0 | learning rate: 3.691E-05 | global batch size: 48 | lm loss: 6.368175E+00 | loss scale: 4096.0 | grad norm: 155104.639 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5241/ 159576 | consumed samples: 133440 | elapsed time per iteration (ms): 15490.2 | learning rate: 3.693E-05 | global batch size: 48 | lm loss: 6.400668E+00 | loss scale: 4096.0 | grad norm: 68076.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5242/ 159576 | consumed samples: 133488 | elapsed time per iteration (ms): 15382.4 | learning rate: 3.694E-05 | global batch size: 48 | lm loss: 6.316941E+00 | loss scale: 4096.0 | grad norm: 57901.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5243/ 159576 | consumed samples: 133536 | elapsed time per iteration (ms): 15382.2 | learning rate: 3.695E-05 | global batch size: 48 | lm loss: 6.494829E+00 | loss scale: 4096.0 | grad norm: 62287.898 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5244/ 159576 | consumed samples: 133584 | elapsed time per iteration (ms): 15661.6 | learning rate: 3.697E-05 | global batch size: 48 | lm loss: 6.397869E+00 | loss scale: 4096.0 | grad norm: 57367.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5245/ 159576 | consumed samples: 133632 | elapsed time per iteration (ms): 15495.8 | learning rate: 3.698E-05 | global batch size: 48 | lm loss: 6.256347E+00 | loss scale: 4096.0 | grad norm: 61800.740 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5246/ 159576 | consumed samples: 133680 | elapsed time per iteration (ms): 15523.0 | learning rate: 3.699E-05 | global batch size: 48 | lm loss: 6.389894E+00 | loss scale: 4096.0 | grad norm: 69126.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5247/ 159576 | consumed samples: 133728 | elapsed time per iteration (ms): 15546.9 | learning rate: 3.701E-05 | global batch size: 48 | lm loss: 6.346736E+00 | loss scale: 4096.0 | grad norm: 67046.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5248/ 159576 | consumed samples: 133776 | elapsed time per iteration (ms): 15650.8 | learning rate: 3.702E-05 | global batch size: 48 | lm loss: 6.430111E+00 | loss scale: 4096.0 | grad norm: 69265.741 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5249/ 159576 | consumed samples: 133824 | elapsed time per iteration (ms): 15490.3 | learning rate: 3.703E-05 | global batch size: 48 | lm loss: 6.390760E+00 | loss scale: 4096.0 | grad norm: 142507.174 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5250/ 159576 | consumed samples: 133872 | elapsed time per iteration (ms): 15521.8 | learning rate: 3.705E-05 | global batch size: 48 | lm loss: 6.420756E+00 | loss scale: 4096.0 | grad norm: 64815.993 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5251/ 159576 | consumed samples: 133920 | elapsed time per iteration (ms): 15759.8 | learning rate: 3.706E-05 | global batch size: 48 | lm loss: 6.360211E+00 | loss scale: 4096.0 | grad norm: 82700.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5252/ 159576 | consumed samples: 133968 | elapsed time per iteration (ms): 15623.8 | learning rate: 3.707E-05 | global batch size: 48 | lm loss: 6.351006E+00 | loss scale: 4096.0 | grad norm: 70030.998 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-24 23:07:37] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-24 23:07:37] PULSE: tr8-104B is running for 17:15:26 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 5253/ 159576 | consumed samples: 134016 | elapsed time per iteration (ms): 15553.2 | learning rate: 3.709E-05 | global batch size: 48 | lm loss: 6.395989E+00 | loss scale: 4096.0 | grad norm: 75934.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5254/ 159576 | consumed samples: 134064 | elapsed time per iteration (ms): 15521.6 | learning rate: 3.710E-05 | global batch size: 48 | lm loss: 6.388237E+00 | loss scale: 4096.0 | grad norm: 85225.047 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5255/ 159576 | consumed samples: 134112 | elapsed time per iteration (ms): 15886.3 | learning rate: 3.711E-05 | global batch size: 48 | lm loss: 6.348703E+00 | loss scale: 4096.0 | grad norm: 72802.836 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5256/ 159576 | consumed samples: 134160 | elapsed time per iteration (ms): 15520.3 | learning rate: 3.713E-05 | global batch size: 48 | lm loss: 6.321572E+00 | loss scale: 4096.0 | grad norm: 73245.874 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5257/ 159576 | consumed samples: 134208 | elapsed time per iteration (ms): 15443.7 | learning rate: 3.714E-05 | global batch size: 48 | lm loss: 6.335665E+00 | loss scale: 4096.0 | grad norm: 58798.760 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5258/ 159576 | consumed samples: 134256 | elapsed time per iteration (ms): 15427.0 | learning rate: 3.715E-05 | global batch size: 48 | lm loss: 6.319070E+00 | loss scale: 4096.0 | grad norm: 66591.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5259/ 159576 | consumed samples: 134304 | elapsed time per iteration (ms): 15760.6 | learning rate: 3.717E-05 | global batch size: 48 | lm loss: 6.229961E+00 | loss scale: 4096.0 | grad norm: 78411.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5260/ 159576 | consumed samples: 134352 | elapsed time per iteration (ms): 15544.0 | learning rate: 3.718E-05 | global batch size: 48 | lm loss: 6.379896E+00 | loss scale: 4096.0 | grad norm: 82294.960 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5261/ 159576 | consumed samples: 134400 | elapsed time per iteration (ms): 15397.8 | learning rate: 3.719E-05 | global batch size: 48 | lm loss: 6.233184E+00 | loss scale: 4096.0 | grad norm: 65525.586 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5262/ 159576 | consumed samples: 134448 | elapsed time per iteration (ms): 15498.3 | learning rate: 3.721E-05 | global batch size: 48 | lm loss: 6.326461E+00 | loss scale: 4096.0 | grad norm: 101232.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5263/ 159576 | consumed samples: 134496 | elapsed time per iteration (ms): 15834.8 | learning rate: 3.722E-05 | global batch size: 48 | lm loss: 6.351873E+00 | loss scale: 4096.0 | grad norm: 82652.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5264/ 159576 | consumed samples: 134544 | elapsed time per iteration (ms): 15450.4 | learning rate: 3.723E-05 | global batch size: 48 | lm loss: 6.411518E+00 | loss scale: 4096.0 | grad norm: 79704.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5265/ 159576 | consumed samples: 134592 | elapsed time per iteration (ms): 15408.5 | learning rate: 3.725E-05 | global batch size: 48 | lm loss: 6.324855E+00 | loss scale: 4096.0 | grad norm: 96783.723 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5266/ 159576 | consumed samples: 134640 | elapsed time per iteration (ms): 15369.4 | learning rate: 3.726E-05 | global batch size: 48 | lm loss: 6.351592E+00 | loss scale: 4096.0 | grad norm: 96231.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5267/ 159576 | consumed samples: 134688 | elapsed time per iteration (ms): 15643.8 | learning rate: 3.727E-05 | global batch size: 48 | lm loss: 6.439371E+00 | loss scale: 4096.0 | grad norm: 86165.942 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5268/ 159576 | consumed samples: 134736 | elapsed time per iteration (ms): 15428.0 | learning rate: 3.729E-05 | global batch size: 48 | lm loss: 6.282881E+00 | loss scale: 4096.0 | grad norm: 95370.085 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5269/ 159576 | consumed samples: 134784 | elapsed time per iteration (ms): 15422.7 | learning rate: 3.730E-05 | global batch size: 48 | lm loss: 6.489480E+00 | loss scale: 4096.0 | grad norm: 77407.640 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5270/ 159576 | consumed samples: 134832 | elapsed time per iteration (ms): 15384.0 | learning rate: 3.731E-05 | global batch size: 48 | lm loss: 6.382200E+00 | loss scale: 4096.0 | grad norm: 66716.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5271/ 159576 | consumed samples: 134880 | elapsed time per iteration (ms): 15581.8 | learning rate: 3.733E-05 | global batch size: 48 | lm loss: 6.409722E+00 | loss scale: 4096.0 | grad norm: 68218.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5272/ 159576 | consumed samples: 134928 | elapsed time per iteration (ms): 15395.7 | learning rate: 3.734E-05 | global batch size: 48 | lm loss: 6.493249E+00 | loss scale: 4096.0 | grad norm: 71580.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5273/ 159576 | consumed samples: 134976 | elapsed time per iteration (ms): 15402.4 | learning rate: 3.735E-05 | global batch size: 48 | lm loss: 6.376624E+00 | loss scale: 4096.0 | grad norm: 85075.910 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5274/ 159576 | consumed samples: 135024 | elapsed time per iteration (ms): 15424.2 | learning rate: 3.737E-05 | global batch size: 48 | lm loss: 6.441435E+00 | loss scale: 4096.0 | grad norm: 75286.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5275/ 159576 | consumed samples: 135072 | elapsed time per iteration (ms): 15616.5 | learning rate: 3.738E-05 | global batch size: 48 | lm loss: 6.428281E+00 | loss scale: 4096.0 | grad norm: 71317.497 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5276/ 159576 | consumed samples: 135120 | elapsed time per iteration (ms): 15383.8 | learning rate: 3.739E-05 | global batch size: 48 | lm loss: 6.324539E+00 | loss scale: 4096.0 | grad norm: 70509.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5277/ 159576 | consumed samples: 135168 | elapsed time per iteration (ms): 15404.4 | learning rate: 3.741E-05 | global batch size: 48 | lm loss: 6.396560E+00 | loss scale: 4096.0 | grad norm: 68223.773 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5278/ 159576 | consumed samples: 135216 | elapsed time per iteration (ms): 15464.0 | learning rate: 3.742E-05 | global batch size: 48 | lm loss: 6.403405E+00 | loss scale: 4096.0 | grad norm: 74828.040 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5279/ 159576 | consumed samples: 135264 | elapsed time per iteration (ms): 15572.0 | learning rate: 3.743E-05 | global batch size: 48 | lm loss: 6.340907E+00 | loss scale: 4096.0 | grad norm: 103719.466 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5280/ 159576 | consumed samples: 135312 | elapsed time per iteration (ms): 15390.1 | learning rate: 3.745E-05 | global batch size: 48 | lm loss: 6.465801E+00 | loss scale: 4096.0 | grad norm: 71954.053 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5281/ 159576 | consumed samples: 135360 | elapsed time per iteration (ms): 15379.3 | learning rate: 3.746E-05 | global batch size: 48 | lm loss: 6.481463E+00 | loss scale: 4096.0 | grad norm: 64156.580 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5282/ 159576 | consumed samples: 135408 | elapsed time per iteration (ms): 15880.0 | learning rate: 3.747E-05 | global batch size: 48 | lm loss: 6.324627E+00 | loss scale: 4096.0 | grad norm: 77974.806 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5283/ 159576 | consumed samples: 135456 | elapsed time per iteration (ms): 15461.2 | learning rate: 3.749E-05 | global batch size: 48 | lm loss: 6.278036E+00 | loss scale: 4096.0 | grad norm: 78417.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5284/ 159576 | consumed samples: 135504 | elapsed time per iteration (ms): 15434.3 | learning rate: 3.750E-05 | global batch size: 48 | lm loss: 6.470399E+00 | loss scale: 4096.0 | grad norm: 70677.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5285/ 159576 | consumed samples: 135552 | elapsed time per iteration (ms): 15453.3 | learning rate: 3.751E-05 | global batch size: 48 | lm loss: 6.465354E+00 | loss scale: 4096.0 | grad norm: 72699.042 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5286/ 159576 | consumed samples: 135600 | elapsed time per iteration (ms): 15799.4 | learning rate: 3.753E-05 | global batch size: 48 | lm loss: 6.366466E+00 | loss scale: 4096.0 | grad norm: 87890.137 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5287/ 159576 | consumed samples: 135648 | elapsed time per iteration (ms): 15462.6 | learning rate: 3.754E-05 | global batch size: 48 | lm loss: 6.450302E+00 | loss scale: 4096.0 | grad norm: 65500.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5288/ 159576 | consumed samples: 135696 | elapsed time per iteration (ms): 15449.3 | learning rate: 3.755E-05 | global batch size: 48 | lm loss: 6.211058E+00 | loss scale: 4096.0 | grad norm: 91309.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5289/ 159576 | consumed samples: 135744 | elapsed time per iteration (ms): 15440.0 | learning rate: 3.757E-05 | global batch size: 48 | lm loss: 6.439297E+00 | loss scale: 4096.0 | grad norm: 78139.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5290/ 159576 | consumed samples: 135792 | elapsed time per iteration (ms): 15759.6 | learning rate: 3.758E-05 | global batch size: 48 | lm loss: 6.295393E+00 | loss scale: 4096.0 | grad norm: 67343.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5291/ 159576 | consumed samples: 135840 | elapsed time per iteration (ms): 15513.6 | learning rate: 3.759E-05 | global batch size: 48 | lm loss: 6.403075E+00 | loss scale: 4096.0 | grad norm: 88227.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5292/ 159576 | consumed samples: 135888 | elapsed time per iteration (ms): 15421.3 | learning rate: 3.761E-05 | global batch size: 48 | lm loss: 6.414333E+00 | loss scale: 4096.0 | grad norm: 78788.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5293/ 159576 | consumed samples: 135936 | elapsed time per iteration (ms): 15345.3 | learning rate: 3.762E-05 | global batch size: 48 | lm loss: 6.292488E+00 | loss scale: 4096.0 | grad norm: 59708.880 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5294/ 159576 | consumed samples: 135984 | elapsed time per iteration (ms): 16027.7 | learning rate: 3.763E-05 | global batch size: 48 | lm loss: 6.385753E+00 | loss scale: 4096.0 | grad norm: 102775.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5295/ 159576 | consumed samples: 136032 | elapsed time per iteration (ms): 15461.5 | learning rate: 3.765E-05 | global batch size: 48 | lm loss: 6.324437E+00 | loss scale: 4096.0 | grad norm: 71697.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5296/ 159576 | consumed samples: 136080 | elapsed time per iteration (ms): 15433.9 | learning rate: 3.766E-05 | global batch size: 48 | lm loss: 6.384956E+00 | loss scale: 4096.0 | grad norm: 102953.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5297/ 159576 | consumed samples: 136128 | elapsed time per iteration (ms): 15429.7 | learning rate: 3.767E-05 | global batch size: 48 | lm loss: 6.436825E+00 | loss scale: 4096.0 | grad norm: 75031.086 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5298/ 159576 | consumed samples: 136176 | elapsed time per iteration (ms): 15818.4 | learning rate: 3.769E-05 | global batch size: 48 | lm loss: 6.482272E+00 | loss scale: 4096.0 | grad norm: 65276.986 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5299/ 159576 | consumed samples: 136224 | elapsed time per iteration (ms): 15441.5 | learning rate: 3.770E-05 | global batch size: 48 | lm loss: 6.589076E+00 | loss scale: 4096.0 | grad norm: 121561.959 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5300/ 159576 | consumed samples: 136272 | elapsed time per iteration (ms): 15422.2 | learning rate: 3.771E-05 | global batch size: 48 | lm loss: 6.405668E+00 | loss scale: 4096.0 | grad norm: 62093.972 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5301/ 159576 | consumed samples: 136320 | elapsed time per iteration (ms): 15355.0 | learning rate: 3.773E-05 | global batch size: 48 | lm loss: 6.390646E+00 | loss scale: 4096.0 | grad norm: 56038.998 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5302/ 159576 | consumed samples: 136368 | elapsed time per iteration (ms): 15565.3 | learning rate: 3.774E-05 | global batch size: 48 | lm loss: 6.410752E+00 | loss scale: 4096.0 | grad norm: 64581.105 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5303/ 159576 | consumed samples: 136416 | elapsed time per iteration (ms): 15422.3 | learning rate: 3.775E-05 | global batch size: 48 | lm loss: 6.448494E+00 | loss scale: 4096.0 | grad norm: 77740.769 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5304/ 159576 | consumed samples: 136464 | elapsed time per iteration (ms): 15454.6 | learning rate: 3.777E-05 | global batch size: 48 | lm loss: 6.436998E+00 | loss scale: 4096.0 | grad norm: 86587.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5305/ 159576 | consumed samples: 136512 | elapsed time per iteration (ms): 15410.7 | learning rate: 3.778E-05 | global batch size: 48 | lm loss: 6.360906E+00 | loss scale: 4096.0 | grad norm: 102483.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5306/ 159576 | consumed samples: 136560 | elapsed time per iteration (ms): 15590.5 | learning rate: 3.779E-05 | global batch size: 48 | lm loss: 6.449046E+00 | loss scale: 4096.0 | grad norm: 63898.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5307/ 159576 | consumed samples: 136608 | elapsed time per iteration (ms): 15506.8 | learning rate: 3.781E-05 | global batch size: 48 | lm loss: 6.467348E+00 | loss scale: 4096.0 | grad norm: 66863.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5308/ 159576 | consumed samples: 136656 | elapsed time per iteration (ms): 15351.0 | learning rate: 3.782E-05 | global batch size: 48 | lm loss: 6.301440E+00 | loss scale: 4096.0 | grad norm: 66038.590 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5309/ 159576 | consumed samples: 136704 | elapsed time per iteration (ms): 15547.1 | learning rate: 3.783E-05 | global batch size: 48 | lm loss: 6.314401E+00 | loss scale: 4096.0 | grad norm: 100622.046 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5310/ 159576 | consumed samples: 136752 | elapsed time per iteration (ms): 15714.1 | learning rate: 3.785E-05 | global batch size: 48 | lm loss: 6.474138E+00 | loss scale: 4096.0 | grad norm: 100713.919 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5311/ 159576 | consumed samples: 136800 | elapsed time per iteration (ms): 15441.4 | learning rate: 3.786E-05 | global batch size: 48 | lm loss: 6.429978E+00 | loss scale: 4096.0 | grad norm: 73118.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5312/ 159576 | consumed samples: 136848 | elapsed time per iteration (ms): 15448.2 | learning rate: 3.787E-05 | global batch size: 48 | lm loss: 6.322928E+00 | loss scale: 4096.0 | grad norm: 79244.189 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5313/ 159576 | consumed samples: 136896 | elapsed time per iteration (ms): 15801.3 | learning rate: 3.789E-05 | global batch size: 48 | lm loss: 6.536728E+00 | loss scale: 4096.0 | grad norm: 80004.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5314/ 159576 | consumed samples: 136944 | elapsed time per iteration (ms): 15420.7 | learning rate: 3.790E-05 | global batch size: 48 | lm loss: 6.358313E+00 | loss scale: 4096.0 | grad norm: 73656.992 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5315/ 159576 | consumed samples: 136992 | elapsed time per iteration (ms): 15430.5 | learning rate: 3.791E-05 | global batch size: 48 | lm loss: 6.285139E+00 | loss scale: 4096.0 | grad norm: 72555.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5316/ 159576 | consumed samples: 137040 | elapsed time per iteration (ms): 15418.3 | learning rate: 3.793E-05 | global batch size: 48 | lm loss: 6.355993E+00 | loss scale: 4096.0 | grad norm: 89604.868 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5317/ 159576 | consumed samples: 137088 | elapsed time per iteration (ms): 15767.6 | learning rate: 3.794E-05 | global batch size: 48 | lm loss: 6.370296E+00 | loss scale: 4096.0 | grad norm: 68760.061 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5318/ 159576 | consumed samples: 137136 | elapsed time per iteration (ms): 15469.0 | learning rate: 3.795E-05 | global batch size: 48 | lm loss: 6.401207E+00 | loss scale: 4096.0 | grad norm: 64825.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5319/ 159576 | consumed samples: 137184 | elapsed time per iteration (ms): 15469.4 | learning rate: 3.797E-05 | global batch size: 48 | lm loss: 6.433188E+00 | loss scale: 4096.0 | grad norm: 75954.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5320/ 159576 | consumed samples: 137232 | elapsed time per iteration (ms): 15484.0 | learning rate: 3.798E-05 | global batch size: 48 | lm loss: 6.422481E+00 | loss scale: 4096.0 | grad norm: 85143.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5321/ 159576 | consumed samples: 137280 | elapsed time per iteration (ms): 15773.2 | learning rate: 3.799E-05 | global batch size: 48 | lm loss: 6.394318E+00 | loss scale: 4096.0 | grad norm: 81431.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5322/ 159576 | consumed samples: 137328 | elapsed time per iteration (ms): 15339.5 | learning rate: 3.801E-05 | global batch size: 48 | lm loss: 6.498918E+00 | loss scale: 4096.0 | grad norm: 76418.870 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5323/ 159576 | consumed samples: 137376 | elapsed time per iteration (ms): 15420.7 | learning rate: 3.802E-05 | global batch size: 48 | lm loss: 6.518599E+00 | loss scale: 4096.0 | grad norm: 71705.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5324/ 159576 | consumed samples: 137424 | elapsed time per iteration (ms): 15420.3 | learning rate: 3.803E-05 | global batch size: 48 | lm loss: 6.429631E+00 | loss scale: 4096.0 | grad norm: 57358.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5325/ 159576 | consumed samples: 137472 | elapsed time per iteration (ms): 15903.1 | learning rate: 3.805E-05 | global batch size: 48 | lm loss: 6.407781E+00 | loss scale: 4096.0 | grad norm: 91506.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5326/ 159576 | consumed samples: 137520 | elapsed time per iteration (ms): 15425.4 | learning rate: 3.806E-05 | global batch size: 48 | lm loss: 6.399868E+00 | loss scale: 4096.0 | grad norm: 68843.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5327/ 159576 | consumed samples: 137568 | elapsed time per iteration (ms): 15444.3 | learning rate: 3.807E-05 | global batch size: 48 | lm loss: 6.412372E+00 | loss scale: 4096.0 | grad norm: 67149.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5328/ 159576 | consumed samples: 137616 | elapsed time per iteration (ms): 15406.6 | learning rate: 3.809E-05 | global batch size: 48 | lm loss: 6.430699E+00 | loss scale: 4096.0 | grad norm: 102742.719 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5329/ 159576 | consumed samples: 137664 | elapsed time per iteration (ms): 15722.7 | learning rate: 3.810E-05 | global batch size: 48 | lm loss: 6.415520E+00 | loss scale: 4096.0 | grad norm: 73301.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5330/ 159576 | consumed samples: 137712 | elapsed time per iteration (ms): 15405.0 | learning rate: 3.811E-05 | global batch size: 48 | lm loss: 6.359590E+00 | loss scale: 4096.0 | grad norm: 70222.523 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5331/ 159576 | consumed samples: 137760 | elapsed time per iteration (ms): 15374.6 | learning rate: 3.813E-05 | global batch size: 48 | lm loss: 6.443409E+00 | loss scale: 4096.0 | grad norm: 79619.657 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5332/ 159576 | consumed samples: 137808 | elapsed time per iteration (ms): 15404.3 | learning rate: 3.814E-05 | global batch size: 48 | lm loss: 6.412749E+00 | loss scale: 4096.0 | grad norm: 110889.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5333/ 159576 | consumed samples: 137856 | elapsed time per iteration (ms): 15590.4 | learning rate: 3.815E-05 | global batch size: 48 | lm loss: 6.492513E+00 | loss scale: 4096.0 | grad norm: 80255.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5334/ 159576 | consumed samples: 137904 | elapsed time per iteration (ms): 15436.5 | learning rate: 3.817E-05 | global batch size: 48 | lm loss: 6.400149E+00 | loss scale: 4096.0 | grad norm: 69554.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5335/ 159576 | consumed samples: 137952 | elapsed time per iteration (ms): 15422.0 | learning rate: 3.818E-05 | global batch size: 48 | lm loss: 6.473186E+00 | loss scale: 4096.0 | grad norm: 96185.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5336/ 159576 | consumed samples: 138000 | elapsed time per iteration (ms): 15442.7 | learning rate: 3.819E-05 | global batch size: 48 | lm loss: 6.552884E+00 | loss scale: 4096.0 | grad norm: 73254.921 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5337/ 159576 | consumed samples: 138048 | elapsed time per iteration (ms): 15634.6 | learning rate: 3.821E-05 | global batch size: 48 | lm loss: 6.365612E+00 | loss scale: 4096.0 | grad norm: 57539.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5338/ 159576 | consumed samples: 138096 | elapsed time per iteration (ms): 15386.8 | learning rate: 3.822E-05 | global batch size: 48 | lm loss: 6.445109E+00 | loss scale: 4096.0 | grad norm: 67382.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5339/ 159576 | consumed samples: 138144 | elapsed time per iteration (ms): 15470.1 | learning rate: 3.823E-05 | global batch size: 48 | lm loss: 6.353713E+00 | loss scale: 4096.0 | grad norm: 110272.660 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5340/ 159576 | consumed samples: 138192 | elapsed time per iteration (ms): 15791.0 | learning rate: 3.825E-05 | global batch size: 48 | lm loss: 6.413539E+00 | loss scale: 4096.0 | grad norm: 72349.998 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5341/ 159576 | consumed samples: 138240 | elapsed time per iteration (ms): 15411.4 | learning rate: 3.826E-05 | global batch size: 48 | lm loss: 6.347322E+00 | loss scale: 4096.0 | grad norm: 61859.125 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5342/ 159576 | consumed samples: 138288 | elapsed time per iteration (ms): 15471.9 | learning rate: 3.827E-05 | global batch size: 48 | lm loss: 6.298682E+00 | loss scale: 4096.0 | grad norm: 78125.812 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5343/ 159576 | consumed samples: 138336 | elapsed time per iteration (ms): 15450.5 | learning rate: 3.829E-05 | global batch size: 48 | lm loss: 6.346509E+00 | loss scale: 4096.0 | grad norm: 76921.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5344/ 159576 | consumed samples: 138384 | elapsed time per iteration (ms): 15797.4 | learning rate: 3.830E-05 | global batch size: 48 | lm loss: 6.464560E+00 | loss scale: 4096.0 | grad norm: 73833.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5345/ 159576 | consumed samples: 138432 | elapsed time per iteration (ms): 15447.3 | learning rate: 3.831E-05 | global batch size: 48 | lm loss: 6.491942E+00 | loss scale: 4096.0 | grad norm: 58609.094 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5346/ 159576 | consumed samples: 138480 | elapsed time per iteration (ms): 15470.6 | learning rate: 3.833E-05 | global batch size: 48 | lm loss: 6.408776E+00 | loss scale: 4096.0 | grad norm: 61084.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5347/ 159576 | consumed samples: 138528 | elapsed time per iteration (ms): 15595.7 | learning rate: 3.834E-05 | global batch size: 48 | lm loss: 6.317072E+00 | loss scale: 4096.0 | grad norm: 79107.564 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5348/ 159576 | consumed samples: 138576 | elapsed time per iteration (ms): 15857.5 | learning rate: 3.835E-05 | global batch size: 48 | lm loss: 6.342214E+00 | loss scale: 4096.0 | grad norm: 82396.508 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5349/ 159576 | consumed samples: 138624 | elapsed time per iteration (ms): 15501.3 | learning rate: 3.837E-05 | global batch size: 48 | lm loss: 6.416060E+00 | loss scale: 4096.0 | grad norm: 58909.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5350/ 159576 | consumed samples: 138672 | elapsed time per iteration (ms): 15334.9 | learning rate: 3.838E-05 | global batch size: 48 | lm loss: 6.348287E+00 | loss scale: 4096.0 | grad norm: 54069.980 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5351/ 159576 | consumed samples: 138720 | elapsed time per iteration (ms): 15454.2 | learning rate: 3.839E-05 | global batch size: 48 | lm loss: 6.456007E+00 | loss scale: 4096.0 | grad norm: 61307.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5352/ 159576 | consumed samples: 138768 | elapsed time per iteration (ms): 15972.1 | learning rate: 3.841E-05 | global batch size: 48 | lm loss: 6.276731E+00 | loss scale: 4096.0 | grad norm: 62789.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5353/ 159576 | consumed samples: 138816 | elapsed time per iteration (ms): 15447.0 | learning rate: 3.842E-05 | global batch size: 48 | lm loss: 6.443192E+00 | loss scale: 4096.0 | grad norm: 75454.112 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5354/ 159576 | consumed samples: 138864 | elapsed time per iteration (ms): 15426.1 | learning rate: 3.843E-05 | global batch size: 48 | lm loss: 6.301665E+00 | loss scale: 4096.0 | grad norm: 66381.021 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5355/ 159576 | consumed samples: 138912 | elapsed time per iteration (ms): 15465.4 | learning rate: 3.845E-05 | global batch size: 48 | lm loss: 6.453572E+00 | loss scale: 4096.0 | grad norm: 63236.178 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5356/ 159576 | consumed samples: 138960 | elapsed time per iteration (ms): 15595.7 | learning rate: 3.846E-05 | global batch size: 48 | lm loss: 6.391494E+00 | loss scale: 4096.0 | grad norm: 78457.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5357/ 159576 | consumed samples: 139008 | elapsed time per iteration (ms): 15508.4 | learning rate: 3.847E-05 | global batch size: 48 | lm loss: 6.379974E+00 | loss scale: 4096.0 | grad norm: 85282.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5358/ 159576 | consumed samples: 139056 | elapsed time per iteration (ms): 15495.7 | learning rate: 3.849E-05 | global batch size: 48 | lm loss: 6.517261E+00 | loss scale: 4096.0 | grad norm: 75329.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5359/ 159576 | consumed samples: 139104 | elapsed time per iteration (ms): 15455.1 | learning rate: 3.850E-05 | global batch size: 48 | lm loss: 6.311386E+00 | loss scale: 4096.0 | grad norm: 74599.792 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5360/ 159576 | consumed samples: 139152 | elapsed time per iteration (ms): 15693.4 | learning rate: 3.851E-05 | global batch size: 48 | lm loss: 6.481428E+00 | loss scale: 4096.0 | grad norm: 77215.648 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5361/ 159576 | consumed samples: 139200 | elapsed time per iteration (ms): 15475.6 | learning rate: 3.853E-05 | global batch size: 48 | lm loss: 6.331719E+00 | loss scale: 4096.0 | grad norm: 60279.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5362/ 159576 | consumed samples: 139248 | elapsed time per iteration (ms): 15551.6 | learning rate: 3.854E-05 | global batch size: 48 | lm loss: 6.506707E+00 | loss scale: 4096.0 | grad norm: 57442.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5363/ 159576 | consumed samples: 139296 | elapsed time per iteration (ms): 15525.0 | learning rate: 3.855E-05 | global batch size: 48 | lm loss: 6.283090E+00 | loss scale: 4096.0 | grad norm: 69167.961 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5364/ 159576 | consumed samples: 139344 | elapsed time per iteration (ms): 15703.9 | learning rate: 3.857E-05 | global batch size: 48 | lm loss: 6.344968E+00 | loss scale: 4096.0 | grad norm: 66351.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5365/ 159576 | consumed samples: 139392 | elapsed time per iteration (ms): 15511.9 | learning rate: 3.858E-05 | global batch size: 48 | lm loss: 6.402239E+00 | loss scale: 4096.0 | grad norm: 69893.747 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5366/ 159576 | consumed samples: 139440 | elapsed time per iteration (ms): 15507.6 | learning rate: 3.859E-05 | global batch size: 48 | lm loss: 6.510591E+00 | loss scale: 4096.0 | grad norm: 73294.922 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5367/ 159576 | consumed samples: 139488 | elapsed time per iteration (ms): 15841.0 | learning rate: 3.861E-05 | global batch size: 48 | lm loss: 6.292207E+00 | loss scale: 4096.0 | grad norm: 69220.189 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5368/ 159576 | consumed samples: 139536 | elapsed time per iteration (ms): 15748.2 | learning rate: 3.862E-05 | global batch size: 48 | lm loss: 6.492587E+00 | loss scale: 4096.0 | grad norm: 78294.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5369/ 159576 | consumed samples: 139584 | elapsed time per iteration (ms): 15492.3 | learning rate: 3.863E-05 | global batch size: 48 | lm loss: 6.493845E+00 | loss scale: 4096.0 | grad norm: 94517.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5370/ 159576 | consumed samples: 139632 | elapsed time per iteration (ms): 15493.8 | learning rate: 3.864E-05 | global batch size: 48 | lm loss: 6.430061E+00 | loss scale: 4096.0 | grad norm: 77523.471 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5371/ 159576 | consumed samples: 139680 | elapsed time per iteration (ms): 15870.2 | learning rate: 3.866E-05 | global batch size: 48 | lm loss: 6.411311E+00 | loss scale: 4096.0 | grad norm: 69582.630 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5372/ 159576 | consumed samples: 139728 | elapsed time per iteration (ms): 15517.9 | learning rate: 3.867E-05 | global batch size: 48 | lm loss: 6.515477E+00 | loss scale: 4096.0 | grad norm: 75626.793 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5373/ 159576 | consumed samples: 139776 | elapsed time per iteration (ms): 15491.8 | learning rate: 3.868E-05 | global batch size: 48 | lm loss: 6.453342E+00 | loss scale: 4096.0 | grad norm: 69940.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5374/ 159576 | consumed samples: 139824 | elapsed time per iteration (ms): 15511.6 | learning rate: 3.870E-05 | global batch size: 48 | lm loss: 6.378087E+00 | loss scale: 4096.0 | grad norm: 70420.660 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5375/ 159576 | consumed samples: 139872 | elapsed time per iteration (ms): 15836.7 | learning rate: 3.871E-05 | global batch size: 48 | lm loss: 6.371119E+00 | loss scale: 4096.0 | grad norm: 56046.647 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5376/ 159576 | consumed samples: 139920 | elapsed time per iteration (ms): 15468.7 | learning rate: 3.872E-05 | global batch size: 48 | lm loss: 6.480386E+00 | loss scale: 4096.0 | grad norm: 67254.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5377/ 159576 | consumed samples: 139968 | elapsed time per iteration (ms): 15505.8 | learning rate: 3.874E-05 | global batch size: 48 | lm loss: 6.445705E+00 | loss scale: 4096.0 | grad norm: 58120.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5378/ 159576 | consumed samples: 140016 | elapsed time per iteration (ms): 15512.2 | learning rate: 3.875E-05 | global batch size: 48 | lm loss: 6.383876E+00 | loss scale: 4096.0 | grad norm: 63811.158 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5379/ 159576 | consumed samples: 140064 | elapsed time per iteration (ms): 15885.3 | learning rate: 3.876E-05 | global batch size: 48 | lm loss: 6.430426E+00 | loss scale: 4096.0 | grad norm: 71627.105 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5380/ 159576 | consumed samples: 140112 | elapsed time per iteration (ms): 15514.4 | learning rate: 3.878E-05 | global batch size: 48 | lm loss: 6.352599E+00 | loss scale: 4096.0 | grad norm: 55768.573 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5381/ 159576 | consumed samples: 140160 | elapsed time per iteration (ms): 15536.5 | learning rate: 3.879E-05 | global batch size: 48 | lm loss: 6.462265E+00 | loss scale: 4096.0 | grad norm: 76307.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5382/ 159576 | consumed samples: 140208 | elapsed time per iteration (ms): 15499.8 | learning rate: 3.880E-05 | global batch size: 48 | lm loss: 6.439154E+00 | loss scale: 4096.0 | grad norm: 97619.861 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5383/ 159576 | consumed samples: 140256 | elapsed time per iteration (ms): 15693.9 | learning rate: 3.882E-05 | global batch size: 48 | lm loss: 6.327425E+00 | loss scale: 4096.0 | grad norm: 69803.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5384/ 159576 | consumed samples: 140304 | elapsed time per iteration (ms): 15550.5 | learning rate: 3.883E-05 | global batch size: 48 | lm loss: 6.391693E+00 | loss scale: 4096.0 | grad norm: 66211.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5385/ 159576 | consumed samples: 140352 | elapsed time per iteration (ms): 15520.0 | learning rate: 3.884E-05 | global batch size: 48 | lm loss: 6.323473E+00 | loss scale: 4096.0 | grad norm: 68034.810 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5386/ 159576 | consumed samples: 140400 | elapsed time per iteration (ms): 15545.0 | learning rate: 3.886E-05 | global batch size: 48 | lm loss: 6.299393E+00 | loss scale: 4096.0 | grad norm: 85492.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5387/ 159576 | consumed samples: 140448 | elapsed time per iteration (ms): 15684.9 | learning rate: 3.887E-05 | global batch size: 48 | lm loss: 6.374225E+00 | loss scale: 4096.0 | grad norm: 72949.757 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5388/ 159576 | consumed samples: 140496 | elapsed time per iteration (ms): 15553.2 | learning rate: 3.888E-05 | global batch size: 48 | lm loss: 6.446224E+00 | loss scale: 4096.0 | grad norm: 83315.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5389/ 159576 | consumed samples: 140544 | elapsed time per iteration (ms): 15520.1 | learning rate: 3.890E-05 | global batch size: 48 | lm loss: 6.336344E+00 | loss scale: 4096.0 | grad norm: 60566.619 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5390/ 159576 | consumed samples: 140592 | elapsed time per iteration (ms): 15438.2 | learning rate: 3.891E-05 | global batch size: 48 | lm loss: 6.437949E+00 | loss scale: 4096.0 | grad norm: 93800.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5391/ 159576 | consumed samples: 140640 | elapsed time per iteration (ms): 15842.4 | learning rate: 3.892E-05 | global batch size: 48 | lm loss: 6.445059E+00 | loss scale: 4096.0 | grad norm: 67207.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5392/ 159576 | consumed samples: 140688 | elapsed time per iteration (ms): 15543.4 | learning rate: 3.894E-05 | global batch size: 48 | lm loss: 6.340952E+00 | loss scale: 4096.0 | grad norm: 92289.634 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5393/ 159576 | consumed samples: 140736 | elapsed time per iteration (ms): 15518.9 | learning rate: 3.895E-05 | global batch size: 48 | lm loss: 6.416577E+00 | loss scale: 4096.0 | grad norm: 84099.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5394/ 159576 | consumed samples: 140784 | elapsed time per iteration (ms): 15997.3 | learning rate: 3.896E-05 | global batch size: 48 | lm loss: 6.439622E+00 | loss scale: 4096.0 | grad norm: 54809.573 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5395/ 159576 | consumed samples: 140832 | elapsed time per iteration (ms): 15450.3 | learning rate: 3.898E-05 | global batch size: 48 | lm loss: 6.441430E+00 | loss scale: 4096.0 | grad norm: 63144.662 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5396/ 159576 | consumed samples: 140880 | elapsed time per iteration (ms): 15568.2 | learning rate: 3.899E-05 | global batch size: 48 | lm loss: 6.424047E+00 | loss scale: 4096.0 | grad norm: 106261.057 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5397/ 159576 | consumed samples: 140928 | elapsed time per iteration (ms): 15464.4 | learning rate: 3.900E-05 | global batch size: 48 | lm loss: 6.325677E+00 | loss scale: 4096.0 | grad norm: 64383.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5398/ 159576 | consumed samples: 140976 | elapsed time per iteration (ms): 15883.9 | learning rate: 3.902E-05 | global batch size: 48 | lm loss: 6.582463E+00 | loss scale: 4096.0 | grad norm: 66662.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5399/ 159576 | consumed samples: 141024 | elapsed time per iteration (ms): 15497.5 | learning rate: 3.903E-05 | global batch size: 48 | lm loss: 6.498641E+00 | loss scale: 4096.0 | grad norm: 59391.511 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5400/ 159576 | consumed samples: 141072 | elapsed time per iteration (ms): 15569.9 | learning rate: 3.904E-05 | global batch size: 48 | lm loss: 6.283938E+00 | loss scale: 4096.0 | grad norm: 64487.813 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5401/ 159576 | consumed samples: 141120 | elapsed time per iteration (ms): 15526.8 | learning rate: 3.906E-05 | global batch size: 48 | lm loss: 6.336715E+00 | loss scale: 4096.0 | grad norm: 57781.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5402/ 159576 | consumed samples: 141168 | elapsed time per iteration (ms): 15981.6 | learning rate: 3.907E-05 | global batch size: 48 | lm loss: 6.293415E+00 | loss scale: 4096.0 | grad norm: 92738.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5403/ 159576 | consumed samples: 141216 | elapsed time per iteration (ms): 15632.0 | learning rate: 3.908E-05 | global batch size: 48 | lm loss: 6.294649E+00 | loss scale: 4096.0 | grad norm: 62910.047 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5404/ 159576 | consumed samples: 141264 | elapsed time per iteration (ms): 15497.6 | learning rate: 3.910E-05 | global batch size: 48 | lm loss: 6.331801E+00 | loss scale: 4096.0 | grad norm: 64648.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5405/ 159576 | consumed samples: 141312 | elapsed time per iteration (ms): 15498.1 | learning rate: 3.911E-05 | global batch size: 48 | lm loss: 6.406822E+00 | loss scale: 4096.0 | grad norm: 71416.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5406/ 159576 | consumed samples: 141360 | elapsed time per iteration (ms): 15867.4 | learning rate: 3.912E-05 | global batch size: 48 | lm loss: 6.404875E+00 | loss scale: 4096.0 | grad norm: 56955.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5407/ 159576 | consumed samples: 141408 | elapsed time per iteration (ms): 15506.2 | learning rate: 3.914E-05 | global batch size: 48 | lm loss: 6.428100E+00 | loss scale: 4096.0 | grad norm: 65410.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5408/ 159576 | consumed samples: 141456 | elapsed time per iteration (ms): 15573.9 | learning rate: 3.915E-05 | global batch size: 48 | lm loss: 6.352518E+00 | loss scale: 4096.0 | grad norm: 57463.162 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5409/ 159576 | consumed samples: 141504 | elapsed time per iteration (ms): 15570.8 | learning rate: 3.916E-05 | global batch size: 48 | lm loss: 6.276915E+00 | loss scale: 4096.0 | grad norm: 56808.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5410/ 159576 | consumed samples: 141552 | elapsed time per iteration (ms): 15647.9 | learning rate: 3.918E-05 | global batch size: 48 | lm loss: 6.388402E+00 | loss scale: 4096.0 | grad norm: 55831.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5411/ 159576 | consumed samples: 141600 | elapsed time per iteration (ms): 15527.8 | learning rate: 3.919E-05 | global batch size: 48 | lm loss: 6.359324E+00 | loss scale: 4096.0 | grad norm: 58176.863 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5412/ 159576 | consumed samples: 141648 | elapsed time per iteration (ms): 15485.9 | learning rate: 3.920E-05 | global batch size: 48 | lm loss: 6.410316E+00 | loss scale: 4096.0 | grad norm: 58797.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5413/ 159576 | consumed samples: 141696 | elapsed time per iteration (ms): 15570.6 | learning rate: 3.922E-05 | global batch size: 48 | lm loss: 6.487602E+00 | loss scale: 4096.0 | grad norm: 54779.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5414/ 159576 | consumed samples: 141744 | elapsed time per iteration (ms): 15692.4 | learning rate: 3.923E-05 | global batch size: 48 | lm loss: 6.538764E+00 | loss scale: 4096.0 | grad norm: 56952.810 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5415/ 159576 | consumed samples: 141808 | elapsed time per iteration (ms): 16423.4 | learning rate: 3.925E-05 | global batch size: 64 | lm loss: 6.468464E+00 | loss scale: 4096.0 | grad norm: 47962.953 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5416/ 159576 | consumed samples: 141872 | elapsed time per iteration (ms): 16486.4 | learning rate: 3.927E-05 | global batch size: 64 | lm loss: 6.358836E+00 | loss scale: 4096.0 | grad norm: 79746.041 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5417/ 159576 | consumed samples: 141936 | elapsed time per iteration (ms): 16837.9 | learning rate: 3.928E-05 | global batch size: 64 | lm loss: 6.458796E+00 | loss scale: 4096.0 | grad norm: 72485.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5418/ 159576 | consumed samples: 142000 | elapsed time per iteration (ms): 16282.1 | learning rate: 3.930E-05 | global batch size: 64 | lm loss: 6.325031E+00 | loss scale: 4096.0 | grad norm: 50657.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5419/ 159576 | consumed samples: 142064 | elapsed time per iteration (ms): 16473.5 | learning rate: 3.932E-05 | global batch size: 64 | lm loss: 6.393603E+00 | loss scale: 4096.0 | grad norm: 53317.124 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5420/ 159576 | consumed samples: 142128 | elapsed time per iteration (ms): 16358.3 | learning rate: 3.934E-05 | global batch size: 64 | lm loss: 6.505975E+00 | loss scale: 4096.0 | grad norm: 76759.970 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5421/ 159576 | consumed samples: 142192 | elapsed time per iteration (ms): 16646.9 | learning rate: 3.936E-05 | global batch size: 64 | lm loss: 6.377459E+00 | loss scale: 4096.0 | grad norm: 61658.865 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5422/ 159576 | consumed samples: 142256 | elapsed time per iteration (ms): 16480.4 | learning rate: 3.937E-05 | global batch size: 64 | lm loss: 6.350579E+00 | loss scale: 4096.0 | grad norm: 61672.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5423/ 159576 | consumed samples: 142320 | elapsed time per iteration (ms): 16500.8 | learning rate: 3.939E-05 | global batch size: 64 | lm loss: 6.359305E+00 | loss scale: 4096.0 | grad norm: 71934.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5424/ 159576 | consumed samples: 142384 | elapsed time per iteration (ms): 16400.7 | learning rate: 3.941E-05 | global batch size: 64 | lm loss: 6.515474E+00 | loss scale: 4096.0 | grad norm: 62262.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5425/ 159576 | consumed samples: 142448 | elapsed time per iteration (ms): 16686.7 | learning rate: 3.943E-05 | global batch size: 64 | lm loss: 6.377324E+00 | loss scale: 4096.0 | grad norm: 66128.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5426/ 159576 | consumed samples: 142512 | elapsed time per iteration (ms): 16346.9 | learning rate: 3.944E-05 | global batch size: 64 | lm loss: 6.394655E+00 | loss scale: 4096.0 | grad norm: 64276.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5427/ 159576 | consumed samples: 142576 | elapsed time per iteration (ms): 16454.0 | learning rate: 3.946E-05 | global batch size: 64 | lm loss: 6.417256E+00 | loss scale: 4096.0 | grad norm: 55916.762 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5428/ 159576 | consumed samples: 142640 | elapsed time per iteration (ms): 16713.8 | learning rate: 3.948E-05 | global batch size: 64 | lm loss: 6.314127E+00 | loss scale: 4096.0 | grad norm: 65443.157 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5429/ 159576 | consumed samples: 142704 | elapsed time per iteration (ms): 16492.7 | learning rate: 3.950E-05 | global batch size: 64 | lm loss: 6.349669E+00 | loss scale: 4096.0 | grad norm: 64819.083 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5430/ 159576 | consumed samples: 142768 | elapsed time per iteration (ms): 16430.1 | learning rate: 3.951E-05 | global batch size: 64 | lm loss: 6.406096E+00 | loss scale: 4096.0 | grad norm: 72027.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5431/ 159576 | consumed samples: 142832 | elapsed time per iteration (ms): 16452.9 | learning rate: 3.953E-05 | global batch size: 64 | lm loss: 6.422045E+00 | loss scale: 4096.0 | grad norm: 59470.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5432/ 159576 | consumed samples: 142896 | elapsed time per iteration (ms): 16574.0 | learning rate: 3.955E-05 | global batch size: 64 | lm loss: 6.384964E+00 | loss scale: 4096.0 | grad norm: 59229.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5433/ 159576 | consumed samples: 142960 | elapsed time per iteration (ms): 16448.4 | learning rate: 3.957E-05 | global batch size: 64 | lm loss: 6.388242E+00 | loss scale: 4096.0 | grad norm: 51139.017 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5434/ 159576 | consumed samples: 143024 | elapsed time per iteration (ms): 16378.2 | learning rate: 3.959E-05 | global batch size: 64 | lm loss: 6.422913E+00 | loss scale: 4096.0 | grad norm: 55548.958 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5435/ 159576 | consumed samples: 143088 | elapsed time per iteration (ms): 16838.8 | learning rate: 3.960E-05 | global batch size: 64 | lm loss: 6.399693E+00 | loss scale: 4096.0 | grad norm: 87728.143 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5436/ 159576 | consumed samples: 143152 | elapsed time per iteration (ms): 16458.9 | learning rate: 3.962E-05 | global batch size: 64 | lm loss: 6.291359E+00 | loss scale: 4096.0 | grad norm: 65955.697 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5437/ 159576 | consumed samples: 143216 | elapsed time per iteration (ms): 16425.2 | learning rate: 3.964E-05 | global batch size: 64 | lm loss: 6.367932E+00 | loss scale: 4096.0 | grad norm: 63150.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5438/ 159576 | consumed samples: 143280 | elapsed time per iteration (ms): 16418.8 | learning rate: 3.966E-05 | global batch size: 64 | lm loss: 6.365756E+00 | loss scale: 4096.0 | grad norm: 57427.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5439/ 159576 | consumed samples: 143344 | elapsed time per iteration (ms): 16802.3 | learning rate: 3.967E-05 | global batch size: 64 | lm loss: 6.415596E+00 | loss scale: 4096.0 | grad norm: 61605.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5440/ 159576 | consumed samples: 143408 | elapsed time per iteration (ms): 16516.9 | learning rate: 3.969E-05 | global batch size: 64 | lm loss: 6.414165E+00 | loss scale: 4096.0 | grad norm: 64434.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5441/ 159576 | consumed samples: 143472 | elapsed time per iteration (ms): 16398.0 | learning rate: 3.971E-05 | global batch size: 64 | lm loss: 6.425170E+00 | loss scale: 4096.0 | grad norm: 63830.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5442/ 159576 | consumed samples: 143536 | elapsed time per iteration (ms): 16330.0 | learning rate: 3.973E-05 | global batch size: 64 | lm loss: 6.420317E+00 | loss scale: 4096.0 | grad norm: 80818.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5443/ 159576 | consumed samples: 143600 | elapsed time per iteration (ms): 16646.2 | learning rate: 3.975E-05 | global batch size: 64 | lm loss: 6.404300E+00 | loss scale: 4096.0 | grad norm: 66058.957 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5444/ 159576 | consumed samples: 143664 | elapsed time per iteration (ms): 16389.9 | learning rate: 3.976E-05 | global batch size: 64 | lm loss: 6.307170E+00 | loss scale: 4096.0 | grad norm: 64553.082 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5445/ 159576 | consumed samples: 143728 | elapsed time per iteration (ms): 16425.8 | learning rate: 3.978E-05 | global batch size: 64 | lm loss: 6.474117E+00 | loss scale: 4096.0 | grad norm: 54414.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5446/ 159576 | consumed samples: 143792 | elapsed time per iteration (ms): 16855.6 | learning rate: 3.980E-05 | global batch size: 64 | lm loss: 6.329272E+00 | loss scale: 4096.0 | grad norm: 67896.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5447/ 159576 | consumed samples: 143856 | elapsed time per iteration (ms): 16363.1 | learning rate: 3.982E-05 | global batch size: 64 | lm loss: 6.485427E+00 | loss scale: 4096.0 | grad norm: 55200.098 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5448/ 159576 | consumed samples: 143920 | elapsed time per iteration (ms): 16446.4 | learning rate: 3.983E-05 | global batch size: 64 | lm loss: 6.474103E+00 | loss scale: 4096.0 | grad norm: 58759.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5449/ 159576 | consumed samples: 143984 | elapsed time per iteration (ms): 16365.5 | learning rate: 3.985E-05 | global batch size: 64 | lm loss: 6.386650E+00 | loss scale: 4096.0 | grad norm: 69075.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5450/ 159576 | consumed samples: 144048 | elapsed time per iteration (ms): 16855.4 | learning rate: 3.987E-05 | global batch size: 64 | lm loss: 6.407839E+00 | loss scale: 4096.0 | grad norm: 76751.714 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5451/ 159576 | consumed samples: 144112 | elapsed time per iteration (ms): 16481.2 | learning rate: 3.989E-05 | global batch size: 64 | lm loss: 6.437217E+00 | loss scale: 4096.0 | grad norm: 60762.834 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5452/ 159576 | consumed samples: 144176 | elapsed time per iteration (ms): 16387.3 | learning rate: 3.991E-05 | global batch size: 64 | lm loss: 6.391966E+00 | loss scale: 4096.0 | grad norm: 57835.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5453/ 159576 | consumed samples: 144240 | elapsed time per iteration (ms): 16456.9 | learning rate: 3.992E-05 | global batch size: 64 | lm loss: 6.407461E+00 | loss scale: 4096.0 | grad norm: 56276.948 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5454/ 159576 | consumed samples: 144304 | elapsed time per iteration (ms): 16533.3 | learning rate: 3.994E-05 | global batch size: 64 | lm loss: 6.319425E+00 | loss scale: 4096.0 | grad norm: 66856.562 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5455/ 159576 | consumed samples: 144368 | elapsed time per iteration (ms): 16417.1 | learning rate: 3.996E-05 | global batch size: 64 | lm loss: 6.377168E+00 | loss scale: 4096.0 | grad norm: 53863.935 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5456/ 159576 | consumed samples: 144432 | elapsed time per iteration (ms): 16422.1 | learning rate: 3.998E-05 | global batch size: 64 | lm loss: 6.368913E+00 | loss scale: 4096.0 | grad norm: 63261.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5457/ 159576 | consumed samples: 144496 | elapsed time per iteration (ms): 16738.2 | learning rate: 3.999E-05 | global batch size: 64 | lm loss: 6.264383E+00 | loss scale: 4096.0 | grad norm: 64656.043 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5458/ 159576 | consumed samples: 144560 | elapsed time per iteration (ms): 16315.9 | learning rate: 4.001E-05 | global batch size: 64 | lm loss: 6.410008E+00 | loss scale: 4096.0 | grad norm: 82472.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5459/ 159576 | consumed samples: 144624 | elapsed time per iteration (ms): 16385.7 | learning rate: 4.003E-05 | global batch size: 64 | lm loss: 6.419100E+00 | loss scale: 4096.0 | grad norm: 81581.674 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5460/ 159576 | consumed samples: 144688 | elapsed time per iteration (ms): 16422.6 | learning rate: 4.005E-05 | global batch size: 64 | lm loss: 6.374327E+00 | loss scale: 4096.0 | grad norm: 77883.993 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5461/ 159576 | consumed samples: 144752 | elapsed time per iteration (ms): 16514.0 | learning rate: 4.007E-05 | global batch size: 64 | lm loss: 6.323710E+00 | loss scale: 4096.0 | grad norm: 59535.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5462/ 159576 | consumed samples: 144816 | elapsed time per iteration (ms): 16520.4 | learning rate: 4.008E-05 | global batch size: 64 | lm loss: 6.325150E+00 | loss scale: 4096.0 | grad norm: 54807.099 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5463/ 159576 | consumed samples: 144880 | elapsed time per iteration (ms): 16362.9 | learning rate: 4.010E-05 | global batch size: 64 | lm loss: 6.461391E+00 | loss scale: 4096.0 | grad norm: 74839.084 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5464/ 159576 | consumed samples: 144944 | elapsed time per iteration (ms): 16408.3 | learning rate: 4.012E-05 | global batch size: 64 | lm loss: 6.392217E+00 | loss scale: 4096.0 | grad norm: 61727.667 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5465/ 159576 | consumed samples: 145008 | elapsed time per iteration (ms): 16556.8 | learning rate: 4.014E-05 | global batch size: 64 | lm loss: 6.349445E+00 | loss scale: 4096.0 | grad norm: 90938.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5466/ 159576 | consumed samples: 145072 | elapsed time per iteration (ms): 16389.1 | learning rate: 4.015E-05 | global batch size: 64 | lm loss: 6.314983E+00 | loss scale: 4096.0 | grad norm: 62408.172 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5467/ 159576 | consumed samples: 145136 | elapsed time per iteration (ms): 16364.1 | learning rate: 4.017E-05 | global batch size: 64 | lm loss: 6.412921E+00 | loss scale: 4096.0 | grad norm: 82535.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5468/ 159576 | consumed samples: 145200 | elapsed time per iteration (ms): 16712.9 | learning rate: 4.019E-05 | global batch size: 64 | lm loss: 6.508467E+00 | loss scale: 4096.0 | grad norm: 53388.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5469/ 159576 | consumed samples: 145264 | elapsed time per iteration (ms): 16357.7 | learning rate: 4.021E-05 | global batch size: 64 | lm loss: 6.367021E+00 | loss scale: 4096.0 | grad norm: 88053.691 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5470/ 159576 | consumed samples: 145328 | elapsed time per iteration (ms): 16424.7 | learning rate: 4.022E-05 | global batch size: 64 | lm loss: 6.396588E+00 | loss scale: 4096.0 | grad norm: 83281.076 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5471/ 159576 | consumed samples: 145392 | elapsed time per iteration (ms): 16363.6 | learning rate: 4.024E-05 | global batch size: 64 | lm loss: 6.387273E+00 | loss scale: 4096.0 | grad norm: 56875.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5472/ 159576 | consumed samples: 145456 | elapsed time per iteration (ms): 16523.2 | learning rate: 4.026E-05 | global batch size: 64 | lm loss: 6.456463E+00 | loss scale: 4096.0 | grad norm: 60270.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5473/ 159576 | consumed samples: 145520 | elapsed time per iteration (ms): 16398.7 | learning rate: 4.028E-05 | global batch size: 64 | lm loss: 6.460003E+00 | loss scale: 4096.0 | grad norm: 61151.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5474/ 159576 | consumed samples: 145584 | elapsed time per iteration (ms): 16345.5 | learning rate: 4.030E-05 | global batch size: 64 | lm loss: 6.443559E+00 | loss scale: 4096.0 | grad norm: 83130.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5475/ 159576 | consumed samples: 145648 | elapsed time per iteration (ms): 16591.9 | learning rate: 4.031E-05 | global batch size: 64 | lm loss: 6.454519E+00 | loss scale: 4096.0 | grad norm: 69198.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5476/ 159576 | consumed samples: 145712 | elapsed time per iteration (ms): 16643.0 | learning rate: 4.033E-05 | global batch size: 64 | lm loss: 6.424469E+00 | loss scale: 4096.0 | grad norm: 57626.523 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5477/ 159576 | consumed samples: 145776 | elapsed time per iteration (ms): 16362.1 | learning rate: 4.035E-05 | global batch size: 64 | lm loss: 6.285227E+00 | loss scale: 4096.0 | grad norm: 87864.793 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5478/ 159576 | consumed samples: 145840 | elapsed time per iteration (ms): 16435.9 | learning rate: 4.037E-05 | global batch size: 64 | lm loss: 6.372074E+00 | loss scale: 4096.0 | grad norm: 67542.980 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5479/ 159576 | consumed samples: 145904 | elapsed time per iteration (ms): 16597.3 | learning rate: 4.038E-05 | global batch size: 64 | lm loss: 6.438199E+00 | loss scale: 4096.0 | grad norm: 74119.106 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5480/ 159576 | consumed samples: 145968 | elapsed time per iteration (ms): 16483.8 | learning rate: 4.040E-05 | global batch size: 64 | lm loss: 6.487626E+00 | loss scale: 4096.0 | grad norm: 68136.689 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 00:07:47] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-25 00:07:47] PULSE: tr8-104B is running for 18:15:36 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 5481/ 159576 | consumed samples: 146032 | elapsed time per iteration (ms): 16373.0 | learning rate: 4.042E-05 | global batch size: 64 | lm loss: 6.280901E+00 | loss scale: 4096.0 | grad norm: 89214.030 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5482/ 159576 | consumed samples: 146096 | elapsed time per iteration (ms): 16391.1 | learning rate: 4.044E-05 | global batch size: 64 | lm loss: 6.407492E+00 | loss scale: 4096.0 | grad norm: 71190.860 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5483/ 159576 | consumed samples: 146160 | elapsed time per iteration (ms): 16510.6 | learning rate: 4.046E-05 | global batch size: 64 | lm loss: 6.338043E+00 | loss scale: 4096.0 | grad norm: 80052.869 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5484/ 159576 | consumed samples: 146224 | elapsed time per iteration (ms): 16428.2 | learning rate: 4.047E-05 | global batch size: 64 | lm loss: 6.381162E+00 | loss scale: 4096.0 | grad norm: 66785.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5485/ 159576 | consumed samples: 146288 | elapsed time per iteration (ms): 16390.1 | learning rate: 4.049E-05 | global batch size: 64 | lm loss: 6.377982E+00 | loss scale: 4096.0 | grad norm: 73739.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5486/ 159576 | consumed samples: 146352 | elapsed time per iteration (ms): 16772.0 | learning rate: 4.051E-05 | global batch size: 64 | lm loss: 6.417017E+00 | loss scale: 4096.0 | grad norm: 101012.887 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5487/ 159576 | consumed samples: 146416 | elapsed time per iteration (ms): 16505.3 | learning rate: 4.053E-05 | global batch size: 64 | lm loss: 6.375125E+00 | loss scale: 4096.0 | grad norm: 62796.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5488/ 159576 | consumed samples: 146480 | elapsed time per iteration (ms): 16398.9 | learning rate: 4.054E-05 | global batch size: 64 | lm loss: 6.370068E+00 | loss scale: 4096.0 | grad norm: 53653.988 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5489/ 159576 | consumed samples: 146544 | elapsed time per iteration (ms): 16369.7 | learning rate: 4.056E-05 | global batch size: 64 | lm loss: 6.376281E+00 | loss scale: 4096.0 | grad norm: 81099.504 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5490/ 159576 | consumed samples: 146608 | elapsed time per iteration (ms): 16827.2 | learning rate: 4.058E-05 | global batch size: 64 | lm loss: 6.479604E+00 | loss scale: 4096.0 | grad norm: 63855.765 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5491/ 159576 | consumed samples: 146672 | elapsed time per iteration (ms): 16415.6 | learning rate: 4.060E-05 | global batch size: 64 | lm loss: 6.352095E+00 | loss scale: 4096.0 | grad norm: 55122.067 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5492/ 159576 | consumed samples: 146736 | elapsed time per iteration (ms): 16444.9 | learning rate: 4.062E-05 | global batch size: 64 | lm loss: 6.506047E+00 | loss scale: 4096.0 | grad norm: 75137.891 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5493/ 159576 | consumed samples: 146800 | elapsed time per iteration (ms): 16342.5 | learning rate: 4.063E-05 | global batch size: 64 | lm loss: 6.379695E+00 | loss scale: 4096.0 | grad norm: 66901.698 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5494/ 159576 | consumed samples: 146864 | elapsed time per iteration (ms): 16502.1 | learning rate: 4.065E-05 | global batch size: 64 | lm loss: 6.368460E+00 | loss scale: 4096.0 | grad norm: 77897.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5495/ 159576 | consumed samples: 146928 | elapsed time per iteration (ms): 16338.1 | learning rate: 4.067E-05 | global batch size: 64 | lm loss: 6.329938E+00 | loss scale: 4096.0 | grad norm: 61931.764 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5496/ 159576 | consumed samples: 146992 | elapsed time per iteration (ms): 16346.0 | learning rate: 4.069E-05 | global batch size: 64 | lm loss: 6.425272E+00 | loss scale: 4096.0 | grad norm: 66524.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5497/ 159576 | consumed samples: 147056 | elapsed time per iteration (ms): 16765.2 | learning rate: 4.070E-05 | global batch size: 64 | lm loss: 6.296051E+00 | loss scale: 4096.0 | grad norm: 85285.961 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5498/ 159576 | consumed samples: 147120 | elapsed time per iteration (ms): 16329.2 | learning rate: 4.072E-05 | global batch size: 64 | lm loss: 6.365289E+00 | loss scale: 4096.0 | grad norm: 66015.174 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5499/ 159576 | consumed samples: 147184 | elapsed time per iteration (ms): 16383.4 | learning rate: 4.074E-05 | global batch size: 64 | lm loss: 6.294851E+00 | loss scale: 4096.0 | grad norm: 79758.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5500/ 159576 | consumed samples: 147248 | elapsed time per iteration (ms): 16337.1 | learning rate: 4.076E-05 | global batch size: 64 | lm loss: 6.289442E+00 | loss scale: 4096.0 | grad norm: 74687.965 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5501/ 159576 | consumed samples: 147312 | elapsed time per iteration (ms): 16790.4 | learning rate: 4.078E-05 | global batch size: 64 | lm loss: 6.322903E+00 | loss scale: 4096.0 | grad norm: 77364.060 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5502/ 159576 | consumed samples: 147376 | elapsed time per iteration (ms): 16423.5 | learning rate: 4.079E-05 | global batch size: 64 | lm loss: 6.460203E+00 | loss scale: 4096.0 | grad norm: 73803.838 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5503/ 159576 | consumed samples: 147440 | elapsed time per iteration (ms): 16368.8 | learning rate: 4.081E-05 | global batch size: 64 | lm loss: 6.396315E+00 | loss scale: 4096.0 | grad norm: 71129.126 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5504/ 159576 | consumed samples: 147504 | elapsed time per iteration (ms): 16346.2 | learning rate: 4.083E-05 | global batch size: 64 | lm loss: 6.425894E+00 | loss scale: 4096.0 | grad norm: 98647.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5505/ 159576 | consumed samples: 147568 | elapsed time per iteration (ms): 16678.7 | learning rate: 4.085E-05 | global batch size: 64 | lm loss: 6.381792E+00 | loss scale: 4096.0 | grad norm: 89626.671 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5506/ 159576 | consumed samples: 147632 | elapsed time per iteration (ms): 16332.5 | learning rate: 4.086E-05 | global batch size: 64 | lm loss: 6.483613E+00 | loss scale: 4096.0 | grad norm: 94069.099 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5507/ 159576 | consumed samples: 147696 | elapsed time per iteration (ms): 16400.4 | learning rate: 4.088E-05 | global batch size: 64 | lm loss: 6.236539E+00 | loss scale: 4096.0 | grad norm: 66871.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5508/ 159576 | consumed samples: 147760 | elapsed time per iteration (ms): 16657.8 | learning rate: 4.090E-05 | global batch size: 64 | lm loss: 6.445796E+00 | loss scale: 4096.0 | grad norm: 79385.972 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5509/ 159576 | consumed samples: 147824 | elapsed time per iteration (ms): 16347.0 | learning rate: 4.092E-05 | global batch size: 64 | lm loss: 6.421635E+00 | loss scale: 4096.0 | grad norm: 76910.947 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5510/ 159576 | consumed samples: 147888 | elapsed time per iteration (ms): 16379.6 | learning rate: 4.093E-05 | global batch size: 64 | lm loss: 6.403854E+00 | loss scale: 4096.0 | grad norm: 131977.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5511/ 159576 | consumed samples: 147952 | elapsed time per iteration (ms): 16364.3 | learning rate: 4.095E-05 | global batch size: 64 | lm loss: 6.393543E+00 | loss scale: 4096.0 | grad norm: 62655.958 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5512/ 159576 | consumed samples: 148016 | elapsed time per iteration (ms): 16734.0 | learning rate: 4.097E-05 | global batch size: 64 | lm loss: 6.378099E+00 | loss scale: 4096.0 | grad norm: 71057.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5513/ 159576 | consumed samples: 148080 | elapsed time per iteration (ms): 16360.1 | learning rate: 4.099E-05 | global batch size: 64 | lm loss: 6.439700E+00 | loss scale: 4096.0 | grad norm: 78346.761 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5514/ 159576 | consumed samples: 148144 | elapsed time per iteration (ms): 16356.7 | learning rate: 4.101E-05 | global batch size: 64 | lm loss: 6.380426E+00 | loss scale: 4096.0 | grad norm: 65583.994 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5515/ 159576 | consumed samples: 148208 | elapsed time per iteration (ms): 16416.2 | learning rate: 4.102E-05 | global batch size: 64 | lm loss: 6.492000E+00 | loss scale: 4096.0 | grad norm: 73724.763 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5516/ 159576 | consumed samples: 148272 | elapsed time per iteration (ms): 16451.6 | learning rate: 4.104E-05 | global batch size: 64 | lm loss: 6.433869E+00 | loss scale: 4096.0 | grad norm: 93695.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5517/ 159576 | consumed samples: 148336 | elapsed time per iteration (ms): 16367.1 | learning rate: 4.106E-05 | global batch size: 64 | lm loss: 6.316652E+00 | loss scale: 4096.0 | grad norm: 93995.663 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5518/ 159576 | consumed samples: 148400 | elapsed time per iteration (ms): 16352.2 | learning rate: 4.108E-05 | global batch size: 64 | lm loss: 6.331068E+00 | loss scale: 4096.0 | grad norm: 64601.046 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5519/ 159576 | consumed samples: 148464 | elapsed time per iteration (ms): 16660.3 | learning rate: 4.109E-05 | global batch size: 64 | lm loss: 6.441586E+00 | loss scale: 4096.0 | grad norm: 74837.727 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5520/ 159576 | consumed samples: 148528 | elapsed time per iteration (ms): 16346.7 | learning rate: 4.111E-05 | global batch size: 64 | lm loss: 6.422507E+00 | loss scale: 4096.0 | grad norm: 57013.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5521/ 159576 | consumed samples: 148592 | elapsed time per iteration (ms): 16378.9 | learning rate: 4.113E-05 | global batch size: 64 | lm loss: 6.388858E+00 | loss scale: 4096.0 | grad norm: 70843.138 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5522/ 159576 | consumed samples: 148656 | elapsed time per iteration (ms): 16311.3 | learning rate: 4.115E-05 | global batch size: 64 | lm loss: 6.335554E+00 | loss scale: 4096.0 | grad norm: 57811.716 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5523/ 159576 | consumed samples: 148720 | elapsed time per iteration (ms): 16599.0 | learning rate: 4.117E-05 | global batch size: 64 | lm loss: 6.427087E+00 | loss scale: 4096.0 | grad norm: 70169.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5524/ 159576 | consumed samples: 148784 | elapsed time per iteration (ms): 16322.1 | learning rate: 4.118E-05 | global batch size: 64 | lm loss: 6.400644E+00 | loss scale: 4096.0 | grad norm: 65162.867 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5525/ 159576 | consumed samples: 148848 | elapsed time per iteration (ms): 16352.5 | learning rate: 4.120E-05 | global batch size: 64 | lm loss: 6.476854E+00 | loss scale: 4096.0 | grad norm: 105828.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5526/ 159576 | consumed samples: 148912 | elapsed time per iteration (ms): 16357.9 | learning rate: 4.122E-05 | global batch size: 64 | lm loss: 6.444851E+00 | loss scale: 4096.0 | grad norm: 100931.662 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5527/ 159576 | consumed samples: 148976 | elapsed time per iteration (ms): 16656.2 | learning rate: 4.124E-05 | global batch size: 64 | lm loss: 6.448713E+00 | loss scale: 4096.0 | grad norm: 81570.122 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5528/ 159576 | consumed samples: 149040 | elapsed time per iteration (ms): 16320.4 | learning rate: 4.125E-05 | global batch size: 64 | lm loss: 6.406240E+00 | loss scale: 4096.0 | grad norm: 82766.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5529/ 159576 | consumed samples: 149104 | elapsed time per iteration (ms): 16353.3 | learning rate: 4.127E-05 | global batch size: 64 | lm loss: 6.376573E+00 | loss scale: 4096.0 | grad norm: 80155.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5530/ 159576 | consumed samples: 149168 | elapsed time per iteration (ms): 16695.5 | learning rate: 4.129E-05 | global batch size: 64 | lm loss: 6.316214E+00 | loss scale: 4096.0 | grad norm: 87358.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5531/ 159576 | consumed samples: 149232 | elapsed time per iteration (ms): 16408.8 | learning rate: 4.131E-05 | global batch size: 64 | lm loss: 6.481884E+00 | loss scale: 4096.0 | grad norm: 86550.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5532/ 159576 | consumed samples: 149296 | elapsed time per iteration (ms): 16343.8 | learning rate: 4.133E-05 | global batch size: 64 | lm loss: 6.483734E+00 | loss scale: 4096.0 | grad norm: 89939.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5533/ 159576 | consumed samples: 149360 | elapsed time per iteration (ms): 16370.7 | learning rate: 4.134E-05 | global batch size: 64 | lm loss: 6.318271E+00 | loss scale: 4096.0 | grad norm: 60516.549 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5534/ 159576 | consumed samples: 149424 | elapsed time per iteration (ms): 16594.8 | learning rate: 4.136E-05 | global batch size: 64 | lm loss: 6.391500E+00 | loss scale: 4096.0 | grad norm: 70379.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5535/ 159576 | consumed samples: 149488 | elapsed time per iteration (ms): 16425.6 | learning rate: 4.138E-05 | global batch size: 64 | lm loss: 6.418231E+00 | loss scale: 4096.0 | grad norm: 76225.739 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5536/ 159576 | consumed samples: 149552 | elapsed time per iteration (ms): 16364.4 | learning rate: 4.140E-05 | global batch size: 64 | lm loss: 6.461292E+00 | loss scale: 4096.0 | grad norm: 117347.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5537/ 159576 | consumed samples: 149616 | elapsed time per iteration (ms): 16683.3 | learning rate: 4.141E-05 | global batch size: 64 | lm loss: 6.394395E+00 | loss scale: 4096.0 | grad norm: 113236.928 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5538/ 159576 | consumed samples: 149680 | elapsed time per iteration (ms): 16407.6 | learning rate: 4.143E-05 | global batch size: 64 | lm loss: 6.348366E+00 | loss scale: 4096.0 | grad norm: 72699.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5539/ 159576 | consumed samples: 149744 | elapsed time per iteration (ms): 16372.4 | learning rate: 4.145E-05 | global batch size: 64 | lm loss: 6.395003E+00 | loss scale: 4096.0 | grad norm: 117054.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5540/ 159576 | consumed samples: 149808 | elapsed time per iteration (ms): 16344.7 | learning rate: 4.147E-05 | global batch size: 64 | lm loss: 6.345469E+00 | loss scale: 4096.0 | grad norm: 66826.178 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5541/ 159576 | consumed samples: 149872 | elapsed time per iteration (ms): 16658.7 | learning rate: 4.149E-05 | global batch size: 64 | lm loss: 6.311511E+00 | loss scale: 4096.0 | grad norm: 82398.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5542/ 159576 | consumed samples: 149936 | elapsed time per iteration (ms): 16382.8 | learning rate: 4.150E-05 | global batch size: 64 | lm loss: 6.407408E+00 | loss scale: 4096.0 | grad norm: 95381.993 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5543/ 159576 | consumed samples: 150000 | elapsed time per iteration (ms): 16397.3 | learning rate: 4.152E-05 | global batch size: 64 | lm loss: 6.385950E+00 | loss scale: 4096.0 | grad norm: 84966.860 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5544/ 159576 | consumed samples: 150064 | elapsed time per iteration (ms): 16328.2 | learning rate: 4.154E-05 | global batch size: 64 | lm loss: 6.386173E+00 | loss scale: 4096.0 | grad norm: 76280.982 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5545/ 159576 | consumed samples: 150128 | elapsed time per iteration (ms): 16536.9 | learning rate: 4.156E-05 | global batch size: 64 | lm loss: 6.429965E+00 | loss scale: 4096.0 | grad norm: 86199.770 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5546/ 159576 | consumed samples: 150192 | elapsed time per iteration (ms): 16341.0 | learning rate: 4.157E-05 | global batch size: 64 | lm loss: 6.440814E+00 | loss scale: 4096.0 | grad norm: 79643.661 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5547/ 159576 | consumed samples: 150256 | elapsed time per iteration (ms): 16434.5 | learning rate: 4.159E-05 | global batch size: 64 | lm loss: 6.292027E+00 | loss scale: 4096.0 | grad norm: 79649.706 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5548/ 159576 | consumed samples: 150320 | elapsed time per iteration (ms): 16744.8 | learning rate: 4.161E-05 | global batch size: 64 | lm loss: 6.363777E+00 | loss scale: 4096.0 | grad norm: 105818.884 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5549/ 159576 | consumed samples: 150384 | elapsed time per iteration (ms): 16446.0 | learning rate: 4.163E-05 | global batch size: 64 | lm loss: 6.525520E+00 | loss scale: 4096.0 | grad norm: 98900.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5550/ 159576 | consumed samples: 150448 | elapsed time per iteration (ms): 16313.7 | learning rate: 4.164E-05 | global batch size: 64 | lm loss: 6.426298E+00 | loss scale: 4096.0 | grad norm: 160080.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5551/ 159576 | consumed samples: 150512 | elapsed time per iteration (ms): 16414.2 | learning rate: 4.166E-05 | global batch size: 64 | lm loss: 6.409907E+00 | loss scale: 4096.0 | grad norm: 101291.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5552/ 159576 | consumed samples: 150576 | elapsed time per iteration (ms): 16772.9 | learning rate: 4.168E-05 | global batch size: 64 | lm loss: 6.312022E+00 | loss scale: 4096.0 | grad norm: 93961.085 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5553/ 159576 | consumed samples: 150640 | elapsed time per iteration (ms): 16393.9 | learning rate: 4.170E-05 | global batch size: 64 | lm loss: 6.460764E+00 | loss scale: 4096.0 | grad norm: 83044.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5554/ 159576 | consumed samples: 150704 | elapsed time per iteration (ms): 16414.7 | learning rate: 4.172E-05 | global batch size: 64 | lm loss: 6.395907E+00 | loss scale: 4096.0 | grad norm: 71935.935 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5555/ 159576 | consumed samples: 150768 | elapsed time per iteration (ms): 16459.3 | learning rate: 4.173E-05 | global batch size: 64 | lm loss: 6.381772E+00 | loss scale: 4096.0 | grad norm: 92358.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5556/ 159576 | consumed samples: 150832 | elapsed time per iteration (ms): 16620.5 | learning rate: 4.175E-05 | global batch size: 64 | lm loss: 6.334103E+00 | loss scale: 4096.0 | grad norm: 135953.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5557/ 159576 | consumed samples: 150896 | elapsed time per iteration (ms): 16420.0 | learning rate: 4.177E-05 | global batch size: 64 | lm loss: 6.350534E+00 | loss scale: 4096.0 | grad norm: 106866.155 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5558/ 159576 | consumed samples: 150960 | elapsed time per iteration (ms): 16394.5 | learning rate: 4.179E-05 | global batch size: 64 | lm loss: 6.449617E+00 | loss scale: 4096.0 | grad norm: 73758.521 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5559/ 159576 | consumed samples: 151024 | elapsed time per iteration (ms): 16702.3 | learning rate: 4.180E-05 | global batch size: 64 | lm loss: 6.422152E+00 | loss scale: 4096.0 | grad norm: 89216.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5560/ 159576 | consumed samples: 151088 | elapsed time per iteration (ms): 16526.0 | learning rate: 4.182E-05 | global batch size: 64 | lm loss: 6.502412E+00 | loss scale: 4096.0 | grad norm: 75899.056 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5561/ 159576 | consumed samples: 151152 | elapsed time per iteration (ms): 16388.8 | learning rate: 4.184E-05 | global batch size: 64 | lm loss: 6.353260E+00 | loss scale: 4096.0 | grad norm: 77216.880 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5562/ 159576 | consumed samples: 151216 | elapsed time per iteration (ms): 16375.8 | learning rate: 4.186E-05 | global batch size: 64 | lm loss: 6.380834E+00 | loss scale: 4096.0 | grad norm: 108978.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5563/ 159576 | consumed samples: 151280 | elapsed time per iteration (ms): 16840.5 | learning rate: 4.188E-05 | global batch size: 64 | lm loss: 6.389106E+00 | loss scale: 4096.0 | grad norm: 109665.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5564/ 159576 | consumed samples: 151344 | elapsed time per iteration (ms): 16437.6 | learning rate: 4.189E-05 | global batch size: 64 | lm loss: 6.440452E+00 | loss scale: 4096.0 | grad norm: 455190.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5565/ 159576 | consumed samples: 151408 | elapsed time per iteration (ms): 16403.9 | learning rate: 4.191E-05 | global batch size: 64 | lm loss: 6.425446E+00 | loss scale: 4096.0 | grad norm: 121150.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5566/ 159576 | consumed samples: 151472 | elapsed time per iteration (ms): 16435.1 | learning rate: 4.193E-05 | global batch size: 64 | lm loss: 6.344089E+00 | loss scale: 4096.0 | grad norm: 92189.151 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5567/ 159576 | consumed samples: 151536 | elapsed time per iteration (ms): 16459.4 | learning rate: 4.195E-05 | global batch size: 64 | lm loss: 6.402337E+00 | loss scale: 4096.0 | grad norm: 84995.771 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5568/ 159576 | consumed samples: 151600 | elapsed time per iteration (ms): 16389.2 | learning rate: 4.196E-05 | global batch size: 64 | lm loss: 6.522965E+00 | loss scale: 4096.0 | grad norm: 82583.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5569/ 159576 | consumed samples: 151664 | elapsed time per iteration (ms): 16371.9 | learning rate: 4.198E-05 | global batch size: 64 | lm loss: 6.357002E+00 | loss scale: 4096.0 | grad norm: 107776.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5570/ 159576 | consumed samples: 151728 | elapsed time per iteration (ms): 16715.6 | learning rate: 4.200E-05 | global batch size: 64 | lm loss: 6.462955E+00 | loss scale: 4096.0 | grad norm: 81656.007 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5571/ 159576 | consumed samples: 151792 | elapsed time per iteration (ms): 16448.5 | learning rate: 4.202E-05 | global batch size: 64 | lm loss: 6.378518E+00 | loss scale: 4096.0 | grad norm: 97168.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5572/ 159576 | consumed samples: 151856 | elapsed time per iteration (ms): 16375.2 | learning rate: 4.204E-05 | global batch size: 64 | lm loss: 6.426227E+00 | loss scale: 4096.0 | grad norm: 138499.065 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5573/ 159576 | consumed samples: 151920 | elapsed time per iteration (ms): 16391.0 | learning rate: 4.205E-05 | global batch size: 64 | lm loss: 6.467142E+00 | loss scale: 4096.0 | grad norm: 86986.159 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5574/ 159576 | consumed samples: 151984 | elapsed time per iteration (ms): 16660.3 | learning rate: 4.207E-05 | global batch size: 64 | lm loss: 6.343758E+00 | loss scale: 4096.0 | grad norm: 94104.183 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5575/ 159576 | consumed samples: 152048 | elapsed time per iteration (ms): 16384.3 | learning rate: 4.209E-05 | global batch size: 64 | lm loss: 6.264513E+00 | loss scale: 4096.0 | grad norm: 84463.915 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5576/ 159576 | consumed samples: 152112 | elapsed time per iteration (ms): 16429.0 | learning rate: 4.211E-05 | global batch size: 64 | lm loss: 6.395695E+00 | loss scale: 4096.0 | grad norm: 91060.071 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5577/ 159576 | consumed samples: 152176 | elapsed time per iteration (ms): 16399.6 | learning rate: 4.212E-05 | global batch size: 64 | lm loss: 6.322819E+00 | loss scale: 4096.0 | grad norm: 78884.092 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5578/ 159576 | consumed samples: 152240 | elapsed time per iteration (ms): 16529.4 | learning rate: 4.214E-05 | global batch size: 64 | lm loss: 6.361033E+00 | loss scale: 4096.0 | grad norm: 132712.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5579/ 159576 | consumed samples: 152304 | elapsed time per iteration (ms): 16454.4 | learning rate: 4.216E-05 | global batch size: 64 | lm loss: 6.276022E+00 | loss scale: 4096.0 | grad norm: 112417.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5580/ 159576 | consumed samples: 152368 | elapsed time per iteration (ms): 16401.1 | learning rate: 4.218E-05 | global batch size: 64 | lm loss: 6.375633E+00 | loss scale: 4096.0 | grad norm: 85824.899 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5581/ 159576 | consumed samples: 152432 | elapsed time per iteration (ms): 16688.1 | learning rate: 4.220E-05 | global batch size: 64 | lm loss: 6.447036E+00 | loss scale: 4096.0 | grad norm: 88314.135 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5582/ 159576 | consumed samples: 152496 | elapsed time per iteration (ms): 16427.8 | learning rate: 4.221E-05 | global batch size: 64 | lm loss: 6.438461E+00 | loss scale: 4096.0 | grad norm: 91826.151 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5583/ 159576 | consumed samples: 152560 | elapsed time per iteration (ms): 16326.4 | learning rate: 4.223E-05 | global batch size: 64 | lm loss: 6.404251E+00 | loss scale: 4096.0 | grad norm: 79746.044 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5584/ 159576 | consumed samples: 152624 | elapsed time per iteration (ms): 16429.7 | learning rate: 4.225E-05 | global batch size: 64 | lm loss: 6.470784E+00 | loss scale: 4096.0 | grad norm: 78255.053 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5585/ 159576 | consumed samples: 152688 | elapsed time per iteration (ms): 16577.7 | learning rate: 4.227E-05 | global batch size: 64 | lm loss: 6.352365E+00 | loss scale: 4096.0 | grad norm: 85894.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5586/ 159576 | consumed samples: 152752 | elapsed time per iteration (ms): 16409.6 | learning rate: 4.228E-05 | global batch size: 64 | lm loss: 6.367690E+00 | loss scale: 4096.0 | grad norm: 268686.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5587/ 159576 | consumed samples: 152816 | elapsed time per iteration (ms): 16393.7 | learning rate: 4.230E-05 | global batch size: 64 | lm loss: 6.334382E+00 | loss scale: 4096.0 | grad norm: 92996.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5588/ 159576 | consumed samples: 152880 | elapsed time per iteration (ms): 16647.8 | learning rate: 4.232E-05 | global batch size: 64 | lm loss: 6.174354E+00 | loss scale: 4096.0 | grad norm: 99570.185 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5589/ 159576 | consumed samples: 152944 | elapsed time per iteration (ms): 16470.5 | learning rate: 4.234E-05 | global batch size: 64 | lm loss: 6.349049E+00 | loss scale: 4096.0 | grad norm: 74523.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5590/ 159576 | consumed samples: 153008 | elapsed time per iteration (ms): 16348.7 | learning rate: 4.236E-05 | global batch size: 64 | lm loss: 6.388356E+00 | loss scale: 4096.0 | grad norm: 57623.843 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5591/ 159576 | consumed samples: 153072 | elapsed time per iteration (ms): 16338.9 | learning rate: 4.237E-05 | global batch size: 64 | lm loss: 6.399694E+00 | loss scale: 4096.0 | grad norm: 75852.068 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5592/ 159576 | consumed samples: 153136 | elapsed time per iteration (ms): 16704.7 | learning rate: 4.239E-05 | global batch size: 64 | lm loss: 6.327959E+00 | loss scale: 4096.0 | grad norm: 69452.758 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5593/ 159576 | consumed samples: 153200 | elapsed time per iteration (ms): 16334.3 | learning rate: 4.241E-05 | global batch size: 64 | lm loss: 6.435533E+00 | loss scale: 4096.0 | grad norm: 111529.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5594/ 159576 | consumed samples: 153264 | elapsed time per iteration (ms): 16385.3 | learning rate: 4.243E-05 | global batch size: 64 | lm loss: 6.438297E+00 | loss scale: 4096.0 | grad norm: 154695.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5595/ 159576 | consumed samples: 153328 | elapsed time per iteration (ms): 16343.1 | learning rate: 4.244E-05 | global batch size: 64 | lm loss: 6.431480E+00 | loss scale: 4096.0 | grad norm: 133987.791 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5596/ 159576 | consumed samples: 153392 | elapsed time per iteration (ms): 16571.5 | learning rate: 4.246E-05 | global batch size: 64 | lm loss: 6.326744E+00 | loss scale: 4096.0 | grad norm: 65072.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5597/ 159576 | consumed samples: 153456 | elapsed time per iteration (ms): 16304.0 | learning rate: 4.248E-05 | global batch size: 64 | lm loss: 6.450805E+00 | loss scale: 4096.0 | grad norm: 67613.081 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5598/ 159576 | consumed samples: 153520 | elapsed time per iteration (ms): 16343.8 | learning rate: 4.250E-05 | global batch size: 64 | lm loss: 6.327376E+00 | loss scale: 4096.0 | grad norm: 77614.563 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5599/ 159576 | consumed samples: 153584 | elapsed time per iteration (ms): 16672.4 | learning rate: 4.251E-05 | global batch size: 64 | lm loss: 6.502485E+00 | loss scale: 4096.0 | grad norm: 97568.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5600/ 159576 | consumed samples: 153648 | elapsed time per iteration (ms): 16410.3 | learning rate: 4.253E-05 | global batch size: 64 | lm loss: 6.429380E+00 | loss scale: 4096.0 | grad norm: 84231.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5601/ 159576 | consumed samples: 153712 | elapsed time per iteration (ms): 16391.0 | learning rate: 4.255E-05 | global batch size: 64 | lm loss: 6.436201E+00 | loss scale: 4096.0 | grad norm: 63319.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5602/ 159576 | consumed samples: 153776 | elapsed time per iteration (ms): 16453.8 | learning rate: 4.257E-05 | global batch size: 64 | lm loss: 6.263167E+00 | loss scale: 4096.0 | grad norm: 71392.865 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5603/ 159576 | consumed samples: 153840 | elapsed time per iteration (ms): 16775.3 | learning rate: 4.259E-05 | global batch size: 64 | lm loss: 6.413259E+00 | loss scale: 4096.0 | grad norm: 123761.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5604/ 159576 | consumed samples: 153904 | elapsed time per iteration (ms): 16504.7 | learning rate: 4.260E-05 | global batch size: 64 | lm loss: 6.544505E+00 | loss scale: 4096.0 | grad norm: 83624.860 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5605/ 159576 | consumed samples: 153968 | elapsed time per iteration (ms): 16306.6 | learning rate: 4.262E-05 | global batch size: 64 | lm loss: 6.452788E+00 | loss scale: 8192.0 | grad norm: 65011.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5606/ 159576 | consumed samples: 154032 | elapsed time per iteration (ms): 16378.4 | learning rate: 4.264E-05 | global batch size: 64 | lm loss: 6.422714E+00 | loss scale: 8192.0 | grad norm: 246798.721 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5607/ 159576 | consumed samples: 154096 | elapsed time per iteration (ms): 16552.8 | learning rate: 4.266E-05 | global batch size: 64 | lm loss: 6.375990E+00 | loss scale: 8192.0 | grad norm: 169739.944 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5608/ 159576 | consumed samples: 154160 | elapsed time per iteration (ms): 16382.8 | learning rate: 4.267E-05 | global batch size: 64 | lm loss: 6.358736E+00 | loss scale: 8192.0 | grad norm: 157950.735 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5609/ 159576 | consumed samples: 154224 | elapsed time per iteration (ms): 16422.0 | learning rate: 4.269E-05 | global batch size: 64 | lm loss: 6.444921E+00 | loss scale: 8192.0 | grad norm: 125911.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5610/ 159576 | consumed samples: 154288 | elapsed time per iteration (ms): 9561.0 | learning rate: 4.269E-05 | global batch size: 64 | lm loss: 6.367582E+00 | loss scale: 8192.0 | grad norm: 125911.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5611/ 159576 | consumed samples: 154352 | elapsed time per iteration (ms): 16020.4 | learning rate: 4.271E-05 | global batch size: 64 | lm loss: 6.341266E+00 | loss scale: 8192.0 | grad norm: 196277.090 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5612/ 159576 | consumed samples: 154416 | elapsed time per iteration (ms): 16411.4 | learning rate: 4.273E-05 | global batch size: 64 | lm loss: 6.386235E+00 | loss scale: 8192.0 | grad norm: 174236.115 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5613/ 159576 | consumed samples: 154480 | elapsed time per iteration (ms): 16406.8 | learning rate: 4.275E-05 | global batch size: 64 | lm loss: 6.302393E+00 | loss scale: 8192.0 | grad norm: 159949.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5614/ 159576 | consumed samples: 154544 | elapsed time per iteration (ms): 16823.0 | learning rate: 4.276E-05 | global batch size: 64 | lm loss: 6.427998E+00 | loss scale: 8192.0 | grad norm: 139822.570 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5615/ 159576 | consumed samples: 154608 | elapsed time per iteration (ms): 16523.9 | learning rate: 4.278E-05 | global batch size: 64 | lm loss: 6.437964E+00 | loss scale: 8192.0 | grad norm: 148561.538 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5616/ 159576 | consumed samples: 154672 | elapsed time per iteration (ms): 16444.1 | learning rate: 4.280E-05 | global batch size: 64 | lm loss: 6.387279E+00 | loss scale: 8192.0 | grad norm: 165172.047 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5617/ 159576 | consumed samples: 154736 | elapsed time per iteration (ms): 16455.6 | learning rate: 4.282E-05 | global batch size: 64 | lm loss: 6.365323E+00 | loss scale: 8192.0 | grad norm: 139740.137 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5618/ 159576 | consumed samples: 154800 | elapsed time per iteration (ms): 16876.6 | learning rate: 4.283E-05 | global batch size: 64 | lm loss: 6.405371E+00 | loss scale: 8192.0 | grad norm: 191865.773 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5619/ 159576 | consumed samples: 154864 | elapsed time per iteration (ms): 16465.6 | learning rate: 4.285E-05 | global batch size: 64 | lm loss: 6.400004E+00 | loss scale: 8192.0 | grad norm: 131301.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5620/ 159576 | consumed samples: 154928 | elapsed time per iteration (ms): 16407.9 | learning rate: 4.287E-05 | global batch size: 64 | lm loss: 6.424757E+00 | loss scale: 8192.0 | grad norm: 152162.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5621/ 159576 | consumed samples: 154992 | elapsed time per iteration (ms): 16429.7 | learning rate: 4.289E-05 | global batch size: 64 | lm loss: 6.415905E+00 | loss scale: 8192.0 | grad norm: 184054.677 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5622/ 159576 | consumed samples: 155056 | elapsed time per iteration (ms): 16685.6 | learning rate: 4.291E-05 | global batch size: 64 | lm loss: 6.440601E+00 | loss scale: 8192.0 | grad norm: 290641.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5623/ 159576 | consumed samples: 155120 | elapsed time per iteration (ms): 16500.9 | learning rate: 4.292E-05 | global batch size: 64 | lm loss: 6.392663E+00 | loss scale: 8192.0 | grad norm: 151394.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5624/ 159576 | consumed samples: 155184 | elapsed time per iteration (ms): 16485.6 | learning rate: 4.294E-05 | global batch size: 64 | lm loss: 6.440325E+00 | loss scale: 8192.0 | grad norm: 132735.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5625/ 159576 | consumed samples: 155248 | elapsed time per iteration (ms): 16832.2 | learning rate: 4.296E-05 | global batch size: 64 | lm loss: 6.382560E+00 | loss scale: 8192.0 | grad norm: 167706.666 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5626/ 159576 | consumed samples: 155312 | elapsed time per iteration (ms): 16294.5 | learning rate: 4.298E-05 | global batch size: 64 | lm loss: 6.422318E+00 | loss scale: 8192.0 | grad norm: 144671.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5627/ 159576 | consumed samples: 155376 | elapsed time per iteration (ms): 16433.6 | learning rate: 4.299E-05 | global batch size: 64 | lm loss: 6.400022E+00 | loss scale: 8192.0 | grad norm: 174837.579 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5628/ 159576 | consumed samples: 155440 | elapsed time per iteration (ms): 16385.0 | learning rate: 4.301E-05 | global batch size: 64 | lm loss: 6.465958E+00 | loss scale: 8192.0 | grad norm: 167317.809 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5629/ 159576 | consumed samples: 155504 | elapsed time per iteration (ms): 16829.3 | learning rate: 4.303E-05 | global batch size: 64 | lm loss: 6.365539E+00 | loss scale: 8192.0 | grad norm: 150073.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5630/ 159576 | consumed samples: 155568 | elapsed time per iteration (ms): 16533.0 | learning rate: 4.305E-05 | global batch size: 64 | lm loss: 6.385098E+00 | loss scale: 8192.0 | grad norm: 132923.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5631/ 159576 | consumed samples: 155632 | elapsed time per iteration (ms): 16451.7 | learning rate: 4.307E-05 | global batch size: 64 | lm loss: 6.314290E+00 | loss scale: 8192.0 | grad norm: 178222.521 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5632/ 159576 | consumed samples: 155696 | elapsed time per iteration (ms): 16400.8 | learning rate: 4.308E-05 | global batch size: 64 | lm loss: 6.467572E+00 | loss scale: 8192.0 | grad norm: 147545.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5633/ 159576 | consumed samples: 155760 | elapsed time per iteration (ms): 16566.1 | learning rate: 4.310E-05 | global batch size: 64 | lm loss: 6.341013E+00 | loss scale: 8192.0 | grad norm: 200712.657 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5634/ 159576 | consumed samples: 155824 | elapsed time per iteration (ms): 16393.9 | learning rate: 4.312E-05 | global batch size: 64 | lm loss: 6.319093E+00 | loss scale: 8192.0 | grad norm: 161666.056 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5635/ 159576 | consumed samples: 155888 | elapsed time per iteration (ms): 16416.9 | learning rate: 4.314E-05 | global batch size: 64 | lm loss: 6.461274E+00 | loss scale: 8192.0 | grad norm: 572124.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5636/ 159576 | consumed samples: 155952 | elapsed time per iteration (ms): 16756.4 | learning rate: 4.315E-05 | global batch size: 64 | lm loss: 6.453969E+00 | loss scale: 8192.0 | grad norm: 205582.781 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5637/ 159576 | consumed samples: 156016 | elapsed time per iteration (ms): 16349.2 | learning rate: 4.317E-05 | global batch size: 64 | lm loss: 6.386354E+00 | loss scale: 8192.0 | grad norm: 188662.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5638/ 159576 | consumed samples: 156080 | elapsed time per iteration (ms): 16437.2 | learning rate: 4.319E-05 | global batch size: 64 | lm loss: 6.458478E+00 | loss scale: 8192.0 | grad norm: 208129.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5639/ 159576 | consumed samples: 156144 | elapsed time per iteration (ms): 16478.4 | learning rate: 4.321E-05 | global batch size: 64 | lm loss: 6.361111E+00 | loss scale: 8192.0 | grad norm: 383224.074 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5640/ 159576 | consumed samples: 156208 | elapsed time per iteration (ms): 16543.3 | learning rate: 4.322E-05 | global batch size: 64 | lm loss: 6.470639E+00 | loss scale: 8192.0 | grad norm: 244281.048 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5641/ 159576 | consumed samples: 156272 | elapsed time per iteration (ms): 16418.6 | learning rate: 4.324E-05 | global batch size: 64 | lm loss: 6.453573E+00 | loss scale: 8192.0 | grad norm: 246555.042 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5642/ 159576 | consumed samples: 156336 | elapsed time per iteration (ms): 16347.0 | learning rate: 4.326E-05 | global batch size: 64 | lm loss: 6.416644E+00 | loss scale: 8192.0 | grad norm: 177394.161 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5643/ 159576 | consumed samples: 156400 | elapsed time per iteration (ms): 9564.0 | learning rate: 4.326E-05 | global batch size: 64 | lm loss: 6.433064E+00 | loss scale: 4096.0 | grad norm: 177394.161 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5644/ 159576 | consumed samples: 156464 | elapsed time per iteration (ms): 16246.5 | learning rate: 4.328E-05 | global batch size: 64 | lm loss: 6.334921E+00 | loss scale: 4096.0 | grad norm: 91031.712 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5645/ 159576 | consumed samples: 156528 | elapsed time per iteration (ms): 16410.8 | learning rate: 4.330E-05 | global batch size: 64 | lm loss: 6.398224E+00 | loss scale: 4096.0 | grad norm: 82899.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5646/ 159576 | consumed samples: 156592 | elapsed time per iteration (ms): 16332.5 | learning rate: 4.331E-05 | global batch size: 64 | lm loss: 6.469447E+00 | loss scale: 4096.0 | grad norm: 93235.700 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5647/ 159576 | consumed samples: 156656 | elapsed time per iteration (ms): 16380.9 | learning rate: 4.333E-05 | global batch size: 64 | lm loss: 6.414939E+00 | loss scale: 4096.0 | grad norm: 98498.938 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5648/ 159576 | consumed samples: 156720 | elapsed time per iteration (ms): 16453.9 | learning rate: 4.335E-05 | global batch size: 64 | lm loss: 6.435335E+00 | loss scale: 4096.0 | grad norm: 110431.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5649/ 159576 | consumed samples: 156784 | elapsed time per iteration (ms): 16375.1 | learning rate: 4.337E-05 | global batch size: 64 | lm loss: 6.367991E+00 | loss scale: 4096.0 | grad norm: 112025.804 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5650/ 159576 | consumed samples: 156848 | elapsed time per iteration (ms): 16396.5 | learning rate: 4.338E-05 | global batch size: 64 | lm loss: 6.453450E+00 | loss scale: 4096.0 | grad norm: 142538.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5651/ 159576 | consumed samples: 156912 | elapsed time per iteration (ms): 16662.1 | learning rate: 4.340E-05 | global batch size: 64 | lm loss: 6.376512E+00 | loss scale: 4096.0 | grad norm: 104884.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5652/ 159576 | consumed samples: 156976 | elapsed time per iteration (ms): 16397.7 | learning rate: 4.342E-05 | global batch size: 64 | lm loss: 6.398083E+00 | loss scale: 4096.0 | grad norm: 97434.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5653/ 159576 | consumed samples: 157040 | elapsed time per iteration (ms): 16367.3 | learning rate: 4.344E-05 | global batch size: 64 | lm loss: 6.468301E+00 | loss scale: 4096.0 | grad norm: 189503.731 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5654/ 159576 | consumed samples: 157104 | elapsed time per iteration (ms): 16332.7 | learning rate: 4.346E-05 | global batch size: 64 | lm loss: 6.449702E+00 | loss scale: 4096.0 | grad norm: 101635.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5655/ 159576 | consumed samples: 157168 | elapsed time per iteration (ms): 16814.3 | learning rate: 4.347E-05 | global batch size: 64 | lm loss: 6.417078E+00 | loss scale: 4096.0 | grad norm: 163445.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5656/ 159576 | consumed samples: 157232 | elapsed time per iteration (ms): 16304.4 | learning rate: 4.349E-05 | global batch size: 64 | lm loss: 6.445296E+00 | loss scale: 4096.0 | grad norm: 90409.939 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5657/ 159576 | consumed samples: 157296 | elapsed time per iteration (ms): 16400.9 | learning rate: 4.351E-05 | global batch size: 64 | lm loss: 6.445564E+00 | loss scale: 4096.0 | grad norm: 81513.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5658/ 159576 | consumed samples: 157360 | elapsed time per iteration (ms): 16340.5 | learning rate: 4.353E-05 | global batch size: 64 | lm loss: 6.333720E+00 | loss scale: 4096.0 | grad norm: 134428.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5659/ 159576 | consumed samples: 157424 | elapsed time per iteration (ms): 16553.5 | learning rate: 4.354E-05 | global batch size: 64 | lm loss: 6.401426E+00 | loss scale: 4096.0 | grad norm: 106022.946 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5660/ 159576 | consumed samples: 157488 | elapsed time per iteration (ms): 16387.3 | learning rate: 4.356E-05 | global batch size: 64 | lm loss: 6.438997E+00 | loss scale: 4096.0 | grad norm: 83955.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5661/ 159576 | consumed samples: 157552 | elapsed time per iteration (ms): 16456.3 | learning rate: 4.358E-05 | global batch size: 64 | lm loss: 6.402083E+00 | loss scale: 4096.0 | grad norm: 85068.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5662/ 159576 | consumed samples: 157616 | elapsed time per iteration (ms): 16696.8 | learning rate: 4.360E-05 | global batch size: 64 | lm loss: 6.441435E+00 | loss scale: 4096.0 | grad norm: 101578.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5663/ 159576 | consumed samples: 157680 | elapsed time per iteration (ms): 16497.3 | learning rate: 4.362E-05 | global batch size: 64 | lm loss: 6.405056E+00 | loss scale: 4096.0 | grad norm: 90814.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5664/ 159576 | consumed samples: 157744 | elapsed time per iteration (ms): 16393.8 | learning rate: 4.363E-05 | global batch size: 64 | lm loss: 6.437488E+00 | loss scale: 4096.0 | grad norm: 99258.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5665/ 159576 | consumed samples: 157808 | elapsed time per iteration (ms): 16464.8 | learning rate: 4.365E-05 | global batch size: 64 | lm loss: 6.461691E+00 | loss scale: 4096.0 | grad norm: 150615.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5666/ 159576 | consumed samples: 157872 | elapsed time per iteration (ms): 16442.6 | learning rate: 4.367E-05 | global batch size: 64 | lm loss: 6.379485E+00 | loss scale: 4096.0 | grad norm: 87553.112 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5667/ 159576 | consumed samples: 157936 | elapsed time per iteration (ms): 16408.0 | learning rate: 4.369E-05 | global batch size: 64 | lm loss: 6.436778E+00 | loss scale: 4096.0 | grad norm: 86837.058 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5668/ 159576 | consumed samples: 158000 | elapsed time per iteration (ms): 16382.6 | learning rate: 4.370E-05 | global batch size: 64 | lm loss: 6.456222E+00 | loss scale: 4096.0 | grad norm: 81561.808 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5669/ 159576 | consumed samples: 158064 | elapsed time per iteration (ms): 16606.9 | learning rate: 4.372E-05 | global batch size: 64 | lm loss: 6.394565E+00 | loss scale: 4096.0 | grad norm: 90655.669 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5670/ 159576 | consumed samples: 158128 | elapsed time per iteration (ms): 16482.0 | learning rate: 4.374E-05 | global batch size: 64 | lm loss: 6.388999E+00 | loss scale: 4096.0 | grad norm: 139861.145 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5671/ 159576 | consumed samples: 158192 | elapsed time per iteration (ms): 16430.2 | learning rate: 4.376E-05 | global batch size: 64 | lm loss: 6.348672E+00 | loss scale: 4096.0 | grad norm: 79933.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5672/ 159576 | consumed samples: 158256 | elapsed time per iteration (ms): 16343.5 | learning rate: 4.378E-05 | global batch size: 64 | lm loss: 6.358377E+00 | loss scale: 4096.0 | grad norm: 91907.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5673/ 159576 | consumed samples: 158320 | elapsed time per iteration (ms): 16738.6 | learning rate: 4.379E-05 | global batch size: 64 | lm loss: 6.397278E+00 | loss scale: 4096.0 | grad norm: 81347.015 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5674/ 159576 | consumed samples: 158384 | elapsed time per iteration (ms): 16377.1 | learning rate: 4.381E-05 | global batch size: 64 | lm loss: 6.330511E+00 | loss scale: 4096.0 | grad norm: 87623.840 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5675/ 159576 | consumed samples: 158448 | elapsed time per iteration (ms): 16376.8 | learning rate: 4.383E-05 | global batch size: 64 | lm loss: 6.400737E+00 | loss scale: 4096.0 | grad norm: 86243.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5676/ 159576 | consumed samples: 158512 | elapsed time per iteration (ms): 16407.2 | learning rate: 4.385E-05 | global batch size: 64 | lm loss: 6.373343E+00 | loss scale: 4096.0 | grad norm: 112233.960 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5677/ 159576 | consumed samples: 158576 | elapsed time per iteration (ms): 16504.3 | learning rate: 4.386E-05 | global batch size: 64 | lm loss: 6.340403E+00 | loss scale: 4096.0 | grad norm: 87545.481 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5678/ 159576 | consumed samples: 158640 | elapsed time per iteration (ms): 16469.6 | learning rate: 4.388E-05 | global batch size: 64 | lm loss: 6.483582E+00 | loss scale: 4096.0 | grad norm: 85898.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5679/ 159576 | consumed samples: 158704 | elapsed time per iteration (ms): 16363.2 | learning rate: 4.390E-05 | global batch size: 64 | lm loss: 6.384809E+00 | loss scale: 4096.0 | grad norm: 75822.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5680/ 159576 | consumed samples: 158768 | elapsed time per iteration (ms): 16705.5 | learning rate: 4.392E-05 | global batch size: 64 | lm loss: 6.360985E+00 | loss scale: 4096.0 | grad norm: 93411.572 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5681/ 159576 | consumed samples: 158832 | elapsed time per iteration (ms): 16533.6 | learning rate: 4.393E-05 | global batch size: 64 | lm loss: 6.346332E+00 | loss scale: 4096.0 | grad norm: 98347.186 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5682/ 159576 | consumed samples: 158896 | elapsed time per iteration (ms): 16424.8 | learning rate: 4.395E-05 | global batch size: 64 | lm loss: 6.452760E+00 | loss scale: 4096.0 | grad norm: 113842.784 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5683/ 159576 | consumed samples: 158960 | elapsed time per iteration (ms): 16412.1 | learning rate: 4.397E-05 | global batch size: 64 | lm loss: 6.394449E+00 | loss scale: 4096.0 | grad norm: 225192.085 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5684/ 159576 | consumed samples: 159024 | elapsed time per iteration (ms): 16934.4 | learning rate: 4.399E-05 | global batch size: 64 | lm loss: 6.394941E+00 | loss scale: 4096.0 | grad norm: 81396.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5685/ 159576 | consumed samples: 159088 | elapsed time per iteration (ms): 16454.0 | learning rate: 4.401E-05 | global batch size: 64 | lm loss: 6.261321E+00 | loss scale: 4096.0 | grad norm: 86149.759 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5686/ 159576 | consumed samples: 159152 | elapsed time per iteration (ms): 16431.5 | learning rate: 4.402E-05 | global batch size: 64 | lm loss: 6.492159E+00 | loss scale: 4096.0 | grad norm: 119300.666 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5687/ 159576 | consumed samples: 159216 | elapsed time per iteration (ms): 16386.6 | learning rate: 4.404E-05 | global batch size: 64 | lm loss: 6.511878E+00 | loss scale: 4096.0 | grad norm: 91338.030 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5688/ 159576 | consumed samples: 159280 | elapsed time per iteration (ms): 16584.3 | learning rate: 4.406E-05 | global batch size: 64 | lm loss: 6.442091E+00 | loss scale: 4096.0 | grad norm: 127329.065 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5689/ 159576 | consumed samples: 159344 | elapsed time per iteration (ms): 16414.9 | learning rate: 4.408E-05 | global batch size: 64 | lm loss: 6.445393E+00 | loss scale: 4096.0 | grad norm: 74818.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5690/ 159576 | consumed samples: 159408 | elapsed time per iteration (ms): 16438.8 | learning rate: 4.409E-05 | global batch size: 64 | lm loss: 6.349149E+00 | loss scale: 4096.0 | grad norm: 90721.765 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5691/ 159576 | consumed samples: 159472 | elapsed time per iteration (ms): 16762.3 | learning rate: 4.411E-05 | global batch size: 64 | lm loss: 6.450273E+00 | loss scale: 4096.0 | grad norm: 84948.864 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5692/ 159576 | consumed samples: 159536 | elapsed time per iteration (ms): 16461.8 | learning rate: 4.413E-05 | global batch size: 64 | lm loss: 6.451497E+00 | loss scale: 4096.0 | grad norm: 160376.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5693/ 159576 | consumed samples: 159600 | elapsed time per iteration (ms): 16376.8 | learning rate: 4.415E-05 | global batch size: 64 | lm loss: 6.414182E+00 | loss scale: 4096.0 | grad norm: 64931.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5694/ 159576 | consumed samples: 159664 | elapsed time per iteration (ms): 16448.9 | learning rate: 4.417E-05 | global batch size: 64 | lm loss: 6.392116E+00 | loss scale: 4096.0 | grad norm: 82604.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5695/ 159576 | consumed samples: 159728 | elapsed time per iteration (ms): 16621.3 | learning rate: 4.418E-05 | global batch size: 64 | lm loss: 6.379553E+00 | loss scale: 4096.0 | grad norm: 96286.790 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5696/ 159576 | consumed samples: 159792 | elapsed time per iteration (ms): 16447.4 | learning rate: 4.420E-05 | global batch size: 64 | lm loss: 6.319911E+00 | loss scale: 4096.0 | grad norm: 113489.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5697/ 159576 | consumed samples: 159856 | elapsed time per iteration (ms): 16402.2 | learning rate: 4.422E-05 | global batch size: 64 | lm loss: 6.538674E+00 | loss scale: 4096.0 | grad norm: 88380.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5698/ 159576 | consumed samples: 159920 | elapsed time per iteration (ms): 16462.3 | learning rate: 4.424E-05 | global batch size: 64 | lm loss: 6.450464E+00 | loss scale: 4096.0 | grad norm: 86213.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5699/ 159576 | consumed samples: 159984 | elapsed time per iteration (ms): 16600.9 | learning rate: 4.425E-05 | global batch size: 64 | lm loss: 6.360521E+00 | loss scale: 4096.0 | grad norm: 102839.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5700/ 159576 | consumed samples: 160048 | elapsed time per iteration (ms): 16472.4 | learning rate: 4.427E-05 | global batch size: 64 | lm loss: 6.408503E+00 | loss scale: 4096.0 | grad norm: 93708.692 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 01:07:57] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition) [2021-09-25 01:07:57] PULSE: tr8-104B is running for 19:15:46 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8]) iteration 5701/ 159576 | consumed samples: 160112 | elapsed time per iteration (ms): 16355.6 | learning rate: 4.429E-05 | global batch size: 64 | lm loss: 6.383047E+00 | loss scale: 4096.0 | grad norm: 277390.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5702/ 159576 | consumed samples: 160176 | elapsed time per iteration (ms): 16761.7 | learning rate: 4.431E-05 | global batch size: 64 | lm loss: 6.450840E+00 | loss scale: 4096.0 | grad norm: 91541.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5703/ 159576 | consumed samples: 160240 | elapsed time per iteration (ms): 9560.9 | learning rate: 4.431E-05 | global batch size: 64 | lm loss: 6.493016E+00 | loss scale: 2048.0 | grad norm: 91541.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5704/ 159576 | consumed samples: 160304 | elapsed time per iteration (ms): 15881.2 | learning rate: 4.433E-05 | global batch size: 64 | lm loss: 6.513262E+00 | loss scale: 2048.0 | grad norm: 63292.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5705/ 159576 | consumed samples: 160368 | elapsed time per iteration (ms): 16396.1 | learning rate: 4.434E-05 | global batch size: 64 | lm loss: 6.341697E+00 | loss scale: 2048.0 | grad norm: 49175.756 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5706/ 159576 | consumed samples: 160432 | elapsed time per iteration (ms): 16742.1 | learning rate: 4.436E-05 | global batch size: 64 | lm loss: 6.376310E+00 | loss scale: 2048.0 | grad norm: 49500.870 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5707/ 159576 | consumed samples: 160496 | elapsed time per iteration (ms): 16502.9 | learning rate: 4.438E-05 | global batch size: 64 | lm loss: 6.305195E+00 | loss scale: 2048.0 | grad norm: 66863.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5708/ 159576 | consumed samples: 160560 | elapsed time per iteration (ms): 16427.2 | learning rate: 4.440E-05 | global batch size: 64 | lm loss: 6.338213E+00 | loss scale: 2048.0 | grad norm: 49886.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5709/ 159576 | consumed samples: 160624 | elapsed time per iteration (ms): 16430.3 | learning rate: 4.441E-05 | global batch size: 64 | lm loss: 6.403567E+00 | loss scale: 2048.0 | grad norm: 67050.774 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5710/ 159576 | consumed samples: 160688 | elapsed time per iteration (ms): 16701.6 | learning rate: 4.443E-05 | global batch size: 64 | lm loss: 6.365169E+00 | loss scale: 2048.0 | grad norm: 65553.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5711/ 159576 | consumed samples: 160752 | elapsed time per iteration (ms): 16495.7 | learning rate: 4.445E-05 | global batch size: 64 | lm loss: 6.437389E+00 | loss scale: 2048.0 | grad norm: 42948.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5712/ 159576 | consumed samples: 160816 | elapsed time per iteration (ms): 16396.0 | learning rate: 4.447E-05 | global batch size: 64 | lm loss: 6.359374E+00 | loss scale: 2048.0 | grad norm: 47459.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5713/ 159576 | consumed samples: 160880 | elapsed time per iteration (ms): 16399.1 | learning rate: 4.449E-05 | global batch size: 64 | lm loss: 6.384996E+00 | loss scale: 2048.0 | grad norm: 54873.542 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5714/ 159576 | consumed samples: 160944 | elapsed time per iteration (ms): 16655.8 | learning rate: 4.450E-05 | global batch size: 64 | lm loss: 6.407744E+00 | loss scale: 2048.0 | grad norm: 49484.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5715/ 159576 | consumed samples: 161008 | elapsed time per iteration (ms): 16395.3 | learning rate: 4.452E-05 | global batch size: 64 | lm loss: 6.596529E+00 | loss scale: 2048.0 | grad norm: 56205.082 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5716/ 159576 | consumed samples: 161072 | elapsed time per iteration (ms): 16464.0 | learning rate: 4.454E-05 | global batch size: 64 | lm loss: 6.421166E+00 | loss scale: 2048.0 | grad norm: 62635.742 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5717/ 159576 | consumed samples: 161136 | elapsed time per iteration (ms): 16725.6 | learning rate: 4.456E-05 | global batch size: 64 | lm loss: 6.470579E+00 | loss scale: 2048.0 | grad norm: 63421.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5718/ 159576 | consumed samples: 161200 | elapsed time per iteration (ms): 16562.5 | learning rate: 4.457E-05 | global batch size: 64 | lm loss: 6.431957E+00 | loss scale: 2048.0 | grad norm: 41629.913 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5719/ 159576 | consumed samples: 161264 | elapsed time per iteration (ms): 16447.6 | learning rate: 4.459E-05 | global batch size: 64 | lm loss: 6.372540E+00 | loss scale: 2048.0 | grad norm: 52749.135 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5720/ 159576 | consumed samples: 161328 | elapsed time per iteration (ms): 16436.0 | learning rate: 4.461E-05 | global batch size: 64 | lm loss: 6.376571E+00 | loss scale: 2048.0 | grad norm: 152378.164 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5721/ 159576 | consumed samples: 161392 | elapsed time per iteration (ms): 16522.7 | learning rate: 4.463E-05 | global batch size: 64 | lm loss: 6.346034E+00 | loss scale: 2048.0 | grad norm: 79170.187 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5722/ 159576 | consumed samples: 161456 | elapsed time per iteration (ms): 16447.7 | learning rate: 4.464E-05 | global batch size: 64 | lm loss: 6.379195E+00 | loss scale: 2048.0 | grad norm: 54035.991 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5723/ 159576 | consumed samples: 161520 | elapsed time per iteration (ms): 16383.8 | learning rate: 4.466E-05 | global batch size: 64 | lm loss: 6.410875E+00 | loss scale: 2048.0 | grad norm: 122622.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5724/ 159576 | consumed samples: 161584 | elapsed time per iteration (ms): 16762.9 | learning rate: 4.468E-05 | global batch size: 64 | lm loss: 6.426128E+00 | loss scale: 2048.0 | grad norm: 61346.953 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5725/ 159576 | consumed samples: 161648 | elapsed time per iteration (ms): 16455.6 | learning rate: 4.470E-05 | global batch size: 64 | lm loss: 6.440339E+00 | loss scale: 2048.0 | grad norm: 114917.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5726/ 159576 | consumed samples: 161712 | elapsed time per iteration (ms): 16491.5 | learning rate: 4.472E-05 | global batch size: 64 | lm loss: 6.229801E+00 | loss scale: 2048.0 | grad norm: 43861.570 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5727/ 159576 | consumed samples: 161776 | elapsed time per iteration (ms): 16434.9 | learning rate: 4.473E-05 | global batch size: 64 | lm loss: 6.503794E+00 | loss scale: 2048.0 | grad norm: 59176.822 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5728/ 159576 | consumed samples: 161840 | elapsed time per iteration (ms): 16686.0 | learning rate: 4.475E-05 | global batch size: 64 | lm loss: 6.415756E+00 | loss scale: 2048.0 | grad norm: 62124.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5729/ 159576 | consumed samples: 161904 | elapsed time per iteration (ms): 16403.6 | learning rate: 4.477E-05 | global batch size: 64 | lm loss: 6.457495E+00 | loss scale: 2048.0 | grad norm: 56507.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5730/ 159576 | consumed samples: 161968 | elapsed time per iteration (ms): 16426.6 | learning rate: 4.479E-05 | global batch size: 64 | lm loss: 6.469141E+00 | loss scale: 2048.0 | grad norm: 61746.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5731/ 159576 | consumed samples: 162032 | elapsed time per iteration (ms): 16455.5 | learning rate: 4.480E-05 | global batch size: 64 | lm loss: 6.459309E+00 | loss scale: 2048.0 | grad norm: 59449.114 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5732/ 159576 | consumed samples: 162096 | elapsed time per iteration (ms): 16649.1 | learning rate: 4.482E-05 | global batch size: 64 | lm loss: 6.402276E+00 | loss scale: 2048.0 | grad norm: 46335.687 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5733/ 159576 | consumed samples: 162160 | elapsed time per iteration (ms): 16461.8 | learning rate: 4.484E-05 | global batch size: 64 | lm loss: 6.519283E+00 | loss scale: 2048.0 | grad norm: 66042.113 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5734/ 159576 | consumed samples: 162224 | elapsed time per iteration (ms): 16320.8 | learning rate: 4.486E-05 | global batch size: 64 | lm loss: 6.357197E+00 | loss scale: 2048.0 | grad norm: 86317.077 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5735/ 159576 | consumed samples: 162288 | elapsed time per iteration (ms): 16817.7 | learning rate: 4.488E-05 | global batch size: 64 | lm loss: 6.412820E+00 | loss scale: 2048.0 | grad norm: 68051.158 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5736/ 159576 | consumed samples: 162352 | elapsed time per iteration (ms): 16374.0 | learning rate: 4.489E-05 | global batch size: 64 | lm loss: 6.409474E+00 | loss scale: 2048.0 | grad norm: 52474.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5737/ 159576 | consumed samples: 162416 | elapsed time per iteration (ms): 16279.5 | learning rate: 4.491E-05 | global batch size: 64 | lm loss: 6.432059E+00 | loss scale: 2048.0 | grad norm: 60932.044 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5738/ 159576 | consumed samples: 162480 | elapsed time per iteration (ms): 16405.5 | learning rate: 4.493E-05 | global batch size: 64 | lm loss: 6.389083E+00 | loss scale: 2048.0 | grad norm: 97554.805 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5739/ 159576 | consumed samples: 162544 | elapsed time per iteration (ms): 16881.2 | learning rate: 4.495E-05 | global batch size: 64 | lm loss: 6.352797E+00 | loss scale: 2048.0 | grad norm: 56410.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5740/ 159576 | consumed samples: 162608 | elapsed time per iteration (ms): 16465.8 | learning rate: 4.496E-05 | global batch size: 64 | lm loss: 6.400247E+00 | loss scale: 2048.0 | grad norm: 67543.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5741/ 159576 | consumed samples: 162672 | elapsed time per iteration (ms): 16430.8 | learning rate: 4.498E-05 | global batch size: 64 | lm loss: 6.361669E+00 | loss scale: 2048.0 | grad norm: 49133.819 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5742/ 159576 | consumed samples: 162736 | elapsed time per iteration (ms): 16371.1 | learning rate: 4.500E-05 | global batch size: 64 | lm loss: 6.415005E+00 | loss scale: 2048.0 | grad norm: 84089.923 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5743/ 159576 | consumed samples: 162800 | elapsed time per iteration (ms): 16700.6 | learning rate: 4.502E-05 | global batch size: 64 | lm loss: 6.365685E+00 | loss scale: 2048.0 | grad norm: 51630.988 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5744/ 159576 | consumed samples: 162864 | elapsed time per iteration (ms): 16325.3 | learning rate: 4.504E-05 | global batch size: 64 | lm loss: 6.440388E+00 | loss scale: 2048.0 | grad norm: 72309.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5745/ 159576 | consumed samples: 162928 | elapsed time per iteration (ms): 16329.9 | learning rate: 4.505E-05 | global batch size: 64 | lm loss: 6.466510E+00 | loss scale: 2048.0 | grad norm: 42690.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5746/ 159576 | consumed samples: 162992 | elapsed time per iteration (ms): 16621.4 | learning rate: 4.507E-05 | global batch size: 64 | lm loss: 6.487222E+00 | loss scale: 2048.0 | grad norm: 71804.170 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5747/ 159576 | consumed samples: 163056 | elapsed time per iteration (ms): 16495.0 | learning rate: 4.509E-05 | global batch size: 64 | lm loss: 6.362286E+00 | loss scale: 2048.0 | grad norm: 86678.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5748/ 159576 | consumed samples: 163120 | elapsed time per iteration (ms): 16346.4 | learning rate: 4.511E-05 | global batch size: 64 | lm loss: 6.356483E+00 | loss scale: 2048.0 | grad norm: 59964.749 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5749/ 159576 | consumed samples: 163184 | elapsed time per iteration (ms): 16441.6 | learning rate: 4.512E-05 | global batch size: 64 | lm loss: 6.417390E+00 | loss scale: 2048.0 | grad norm: 50380.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5750/ 159576 | consumed samples: 163248 | elapsed time per iteration (ms): 16658.5 | learning rate: 4.514E-05 | global batch size: 64 | lm loss: 6.274541E+00 | loss scale: 2048.0 | grad norm: 39059.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5751/ 159576 | consumed samples: 163312 | elapsed time per iteration (ms): 16405.5 | learning rate: 4.516E-05 | global batch size: 64 | lm loss: 6.367218E+00 | loss scale: 2048.0 | grad norm: 51183.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5752/ 159576 | consumed samples: 163376 | elapsed time per iteration (ms): 16320.2 | learning rate: 4.518E-05 | global batch size: 64 | lm loss: 6.344701E+00 | loss scale: 2048.0 | grad norm: 36962.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5753/ 159576 | consumed samples: 163440 | elapsed time per iteration (ms): 16390.0 | learning rate: 4.520E-05 | global batch size: 64 | lm loss: 6.400953E+00 | loss scale: 2048.0 | grad norm: 66022.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5754/ 159576 | consumed samples: 163504 | elapsed time per iteration (ms): 16546.1 | learning rate: 4.521E-05 | global batch size: 64 | lm loss: 6.378292E+00 | loss scale: 2048.0 | grad norm: 51492.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5755/ 159576 | consumed samples: 163568 | elapsed time per iteration (ms): 16433.9 | learning rate: 4.523E-05 | global batch size: 64 | lm loss: 6.447009E+00 | loss scale: 2048.0 | grad norm: 67150.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5756/ 159576 | consumed samples: 163632 | elapsed time per iteration (ms): 16359.3 | learning rate: 4.525E-05 | global batch size: 64 | lm loss: 6.393310E+00 | loss scale: 2048.0 | grad norm: 47124.929 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5757/ 159576 | consumed samples: 163696 | elapsed time per iteration (ms): 16714.1 | learning rate: 4.527E-05 | global batch size: 64 | lm loss: 6.428847E+00 | loss scale: 2048.0 | grad norm: 73984.124 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5758/ 159576 | consumed samples: 163760 | elapsed time per iteration (ms): 16285.5 | learning rate: 4.528E-05 | global batch size: 64 | lm loss: 6.410369E+00 | loss scale: 2048.0 | grad norm: 51894.933 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5759/ 159576 | consumed samples: 163824 | elapsed time per iteration (ms): 16346.5 | learning rate: 4.530E-05 | global batch size: 64 | lm loss: 6.361977E+00 | loss scale: 2048.0 | grad norm: 46022.549 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5760/ 159576 | consumed samples: 163888 | elapsed time per iteration (ms): 16363.4 | learning rate: 4.532E-05 | global batch size: 64 | lm loss: 6.411450E+00 | loss scale: 2048.0 | grad norm: 62804.958 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5761/ 159576 | consumed samples: 163952 | elapsed time per iteration (ms): 16576.6 | learning rate: 4.534E-05 | global batch size: 64 | lm loss: 6.492290E+00 | loss scale: 2048.0 | grad norm: 91376.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5762/ 159576 | consumed samples: 164016 | elapsed time per iteration (ms): 16429.0 | learning rate: 4.536E-05 | global batch size: 64 | lm loss: 6.351690E+00 | loss scale: 2048.0 | grad norm: 56460.123 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5763/ 159576 | consumed samples: 164080 | elapsed time per iteration (ms): 16419.8 | learning rate: 4.537E-05 | global batch size: 64 | lm loss: 6.388021E+00 | loss scale: 2048.0 | grad norm: 48184.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5764/ 159576 | consumed samples: 164144 | elapsed time per iteration (ms): 16346.0 | learning rate: 4.539E-05 | global batch size: 64 | lm loss: 6.500803E+00 | loss scale: 2048.0 | grad norm: 47702.715 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5765/ 159576 | consumed samples: 164208 | elapsed time per iteration (ms): 16601.8 | learning rate: 4.541E-05 | global batch size: 64 | lm loss: 6.377601E+00 | loss scale: 2048.0 | grad norm: 52558.168 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5766/ 159576 | consumed samples: 164272 | elapsed time per iteration (ms): 16306.8 | learning rate: 4.543E-05 | global batch size: 64 | lm loss: 6.348913E+00 | loss scale: 2048.0 | grad norm: 75335.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5767/ 159576 | consumed samples: 164336 | elapsed time per iteration (ms): 16391.8 | learning rate: 4.544E-05 | global batch size: 64 | lm loss: 6.287434E+00 | loss scale: 2048.0 | grad norm: 51886.097 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5768/ 159576 | consumed samples: 164400 | elapsed time per iteration (ms): 16644.5 | learning rate: 4.546E-05 | global batch size: 64 | lm loss: 6.409395E+00 | loss scale: 2048.0 | grad norm: 59368.924 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5769/ 159576 | consumed samples: 164464 | elapsed time per iteration (ms): 16355.1 | learning rate: 4.548E-05 | global batch size: 64 | lm loss: 6.376360E+00 | loss scale: 2048.0 | grad norm: 45775.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5770/ 159576 | consumed samples: 164528 | elapsed time per iteration (ms): 16317.3 | learning rate: 4.550E-05 | global batch size: 64 | lm loss: 6.428416E+00 | loss scale: 2048.0 | grad norm: 53234.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5771/ 159576 | consumed samples: 164592 | elapsed time per iteration (ms): 16327.7 | learning rate: 4.551E-05 | global batch size: 64 | lm loss: 6.374567E+00 | loss scale: 2048.0 | grad norm: 44963.056 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5772/ 159576 | consumed samples: 164656 | elapsed time per iteration (ms): 16674.7 | learning rate: 4.553E-05 | global batch size: 64 | lm loss: 6.357097E+00 | loss scale: 2048.0 | grad norm: 47484.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5773/ 159576 | consumed samples: 164720 | elapsed time per iteration (ms): 16463.9 | learning rate: 4.555E-05 | global batch size: 64 | lm loss: 6.398357E+00 | loss scale: 2048.0 | grad norm: 41638.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5774/ 159576 | consumed samples: 164784 | elapsed time per iteration (ms): 16348.7 | learning rate: 4.557E-05 | global batch size: 64 | lm loss: 6.351582E+00 | loss scale: 2048.0 | grad norm: 54903.850 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5775/ 159576 | consumed samples: 164848 | elapsed time per iteration (ms): 16736.5 | learning rate: 4.559E-05 | global batch size: 64 | lm loss: 6.367338E+00 | loss scale: 2048.0 | grad norm: 43171.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5776/ 159576 | consumed samples: 164912 | elapsed time per iteration (ms): 16420.4 | learning rate: 4.560E-05 | global batch size: 64 | lm loss: 6.386267E+00 | loss scale: 2048.0 | grad norm: 68637.095 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5777/ 159576 | consumed samples: 164976 | elapsed time per iteration (ms): 16467.1 | learning rate: 4.562E-05 | global batch size: 64 | lm loss: 6.368368E+00 | loss scale: 2048.0 | grad norm: 47557.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5778/ 159576 | consumed samples: 165040 | elapsed time per iteration (ms): 16383.6 | learning rate: 4.564E-05 | global batch size: 64 | lm loss: 6.360928E+00 | loss scale: 2048.0 | grad norm: 48661.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5779/ 159576 | consumed samples: 165104 | elapsed time per iteration (ms): 16795.3 | learning rate: 4.566E-05 | global batch size: 64 | lm loss: 6.286585E+00 | loss scale: 2048.0 | grad norm: 41957.074 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5780/ 159576 | consumed samples: 165168 | elapsed time per iteration (ms): 16414.6 | learning rate: 4.567E-05 | global batch size: 64 | lm loss: 6.329445E+00 | loss scale: 2048.0 | grad norm: 58532.760 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5781/ 159576 | consumed samples: 165232 | elapsed time per iteration (ms): 16413.2 | learning rate: 4.569E-05 | global batch size: 64 | lm loss: 6.447413E+00 | loss scale: 2048.0 | grad norm: 58971.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5782/ 159576 | consumed samples: 165296 | elapsed time per iteration (ms): 16345.1 | learning rate: 4.571E-05 | global batch size: 64 | lm loss: 6.367276E+00 | loss scale: 2048.0 | grad norm: 62853.125 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5783/ 159576 | consumed samples: 165360 | elapsed time per iteration (ms): 16700.8 | learning rate: 4.573E-05 | global batch size: 64 | lm loss: 6.394166E+00 | loss scale: 2048.0 | grad norm: 104426.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5784/ 159576 | consumed samples: 165424 | elapsed time per iteration (ms): 16276.5 | learning rate: 4.575E-05 | global batch size: 64 | lm loss: 6.447882E+00 | loss scale: 2048.0 | grad norm: 50564.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5785/ 159576 | consumed samples: 165488 | elapsed time per iteration (ms): 16423.7 | learning rate: 4.576E-05 | global batch size: 64 | lm loss: 6.341421E+00 | loss scale: 2048.0 | grad norm: 126331.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5786/ 159576 | consumed samples: 165552 | elapsed time per iteration (ms): 16792.0 | learning rate: 4.578E-05 | global batch size: 64 | lm loss: 6.384687E+00 | loss scale: 2048.0 | grad norm: 54058.867 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5787/ 159576 | consumed samples: 165616 | elapsed time per iteration (ms): 16388.2 | learning rate: 4.580E-05 | global batch size: 64 | lm loss: 6.392807E+00 | loss scale: 2048.0 | grad norm: 59371.923 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5788/ 159576 | consumed samples: 165680 | elapsed time per iteration (ms): 16392.6 | learning rate: 4.582E-05 | global batch size: 64 | lm loss: 6.457485E+00 | loss scale: 2048.0 | grad norm: 65736.175 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5789/ 159576 | consumed samples: 165744 | elapsed time per iteration (ms): 16338.9 | learning rate: 4.583E-05 | global batch size: 64 | lm loss: 6.370594E+00 | loss scale: 2048.0 | grad norm: 86846.852 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5790/ 159576 | consumed samples: 165808 | elapsed time per iteration (ms): 16857.0 | learning rate: 4.585E-05 | global batch size: 64 | lm loss: 6.412526E+00 | loss scale: 2048.0 | grad norm: 77325.810 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5791/ 159576 | consumed samples: 165872 | elapsed time per iteration (ms): 16398.4 | learning rate: 4.587E-05 | global batch size: 64 | lm loss: 6.412295E+00 | loss scale: 2048.0 | grad norm: 50166.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5792/ 159576 | consumed samples: 165936 | elapsed time per iteration (ms): 16290.5 | learning rate: 4.589E-05 | global batch size: 64 | lm loss: 6.380277E+00 | loss scale: 2048.0 | grad norm: 48226.590 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5793/ 159576 | consumed samples: 166000 | elapsed time per iteration (ms): 16371.0 | learning rate: 4.591E-05 | global batch size: 64 | lm loss: 6.359699E+00 | loss scale: 2048.0 | grad norm: 65168.886 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5794/ 159576 | consumed samples: 166064 | elapsed time per iteration (ms): 16645.3 | learning rate: 4.592E-05 | global batch size: 64 | lm loss: 6.321030E+00 | loss scale: 2048.0 | grad norm: 52186.470 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5795/ 159576 | consumed samples: 166128 | elapsed time per iteration (ms): 16469.4 | learning rate: 4.594E-05 | global batch size: 64 | lm loss: 6.393083E+00 | loss scale: 2048.0 | grad norm: 55272.030 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5796/ 159576 | consumed samples: 166192 | elapsed time per iteration (ms): 16425.9 | learning rate: 4.596E-05 | global batch size: 64 | lm loss: 6.374780E+00 | loss scale: 2048.0 | grad norm: 53939.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5797/ 159576 | consumed samples: 166256 | elapsed time per iteration (ms): 16770.7 | learning rate: 4.598E-05 | global batch size: 64 | lm loss: 6.376060E+00 | loss scale: 2048.0 | grad norm: 62276.052 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5798/ 159576 | consumed samples: 166320 | elapsed time per iteration (ms): 16339.0 | learning rate: 4.599E-05 | global batch size: 64 | lm loss: 6.463357E+00 | loss scale: 2048.0 | grad norm: 55276.460 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5799/ 159576 | consumed samples: 166384 | elapsed time per iteration (ms): 16400.6 | learning rate: 4.601E-05 | global batch size: 64 | lm loss: 6.364144E+00 | loss scale: 2048.0 | grad norm: 46941.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5800/ 159576 | consumed samples: 166448 | elapsed time per iteration (ms): 16328.3 | learning rate: 4.603E-05 | global batch size: 64 | lm loss: 6.412081E+00 | loss scale: 2048.0 | grad norm: 61281.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5801/ 159576 | consumed samples: 166512 | elapsed time per iteration (ms): 16791.0 | learning rate: 4.605E-05 | global batch size: 64 | lm loss: 6.396990E+00 | loss scale: 2048.0 | grad norm: 90543.167 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5802/ 159576 | consumed samples: 166576 | elapsed time per iteration (ms): 16555.9 | learning rate: 4.607E-05 | global batch size: 64 | lm loss: 6.358585E+00 | loss scale: 2048.0 | grad norm: 43097.920 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5803/ 159576 | consumed samples: 166640 | elapsed time per iteration (ms): 16465.5 | learning rate: 4.608E-05 | global batch size: 64 | lm loss: 6.493999E+00 | loss scale: 2048.0 | grad norm: 45567.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5804/ 159576 | consumed samples: 166704 | elapsed time per iteration (ms): 16436.4 | learning rate: 4.610E-05 | global batch size: 64 | lm loss: 6.533109E+00 | loss scale: 2048.0 | grad norm: 127288.085 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5805/ 159576 | consumed samples: 166768 | elapsed time per iteration (ms): 16549.3 | learning rate: 4.612E-05 | global batch size: 64 | lm loss: 6.379089E+00 | loss scale: 2048.0 | grad norm: 48002.691 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5806/ 159576 | consumed samples: 166832 | elapsed time per iteration (ms): 16407.1 | learning rate: 4.614E-05 | global batch size: 64 | lm loss: 6.365424E+00 | loss scale: 2048.0 | grad norm: 49891.608 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5807/ 159576 | consumed samples: 166896 | elapsed time per iteration (ms): 16379.2 | learning rate: 4.615E-05 | global batch size: 64 | lm loss: 6.476014E+00 | loss scale: 2048.0 | grad norm: 47532.881 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5808/ 159576 | consumed samples: 166960 | elapsed time per iteration (ms): 16753.6 | learning rate: 4.617E-05 | global batch size: 64 | lm loss: 6.354483E+00 | loss scale: 2048.0 | grad norm: 56392.704 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5809/ 159576 | consumed samples: 167024 | elapsed time per iteration (ms): 16393.4 | learning rate: 4.619E-05 | global batch size: 64 | lm loss: 6.519560E+00 | loss scale: 2048.0 | grad norm: 44344.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5810/ 159576 | consumed samples: 167088 | elapsed time per iteration (ms): 16492.5 | learning rate: 4.621E-05 | global batch size: 64 | lm loss: 6.408142E+00 | loss scale: 2048.0 | grad norm: 49620.831 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5811/ 159576 | consumed samples: 167152 | elapsed time per iteration (ms): 16428.1 | learning rate: 4.622E-05 | global batch size: 64 | lm loss: 6.376643E+00 | loss scale: 2048.0 | grad norm: 54930.966 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5812/ 159576 | consumed samples: 167216 | elapsed time per iteration (ms): 16603.5 | learning rate: 4.624E-05 | global batch size: 64 | lm loss: 6.446056E+00 | loss scale: 2048.0 | grad norm: 49991.934 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5813/ 159576 | consumed samples: 167280 | elapsed time per iteration (ms): 16423.7 | learning rate: 4.626E-05 | global batch size: 64 | lm loss: 6.503972E+00 | loss scale: 2048.0 | grad norm: 48324.994 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5814/ 159576 | consumed samples: 167344 | elapsed time per iteration (ms): 16392.6 | learning rate: 4.628E-05 | global batch size: 64 | lm loss: 6.483917E+00 | loss scale: 2048.0 | grad norm: 49344.656 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5815/ 159576 | consumed samples: 167408 | elapsed time per iteration (ms): 16437.6 | learning rate: 4.630E-05 | global batch size: 64 | lm loss: 6.359298E+00 | loss scale: 2048.0 | grad norm: 46826.938 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5816/ 159576 | consumed samples: 167472 | elapsed time per iteration (ms): 16791.2 | learning rate: 4.631E-05 | global batch size: 64 | lm loss: 6.477077E+00 | loss scale: 2048.0 | grad norm: 80606.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5817/ 159576 | consumed samples: 167536 | elapsed time per iteration (ms): 16448.9 | learning rate: 4.633E-05 | global batch size: 64 | lm loss: 6.378170E+00 | loss scale: 2048.0 | grad norm: 50159.917 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5818/ 159576 | consumed samples: 167600 | elapsed time per iteration (ms): 16473.7 | learning rate: 4.635E-05 | global batch size: 64 | lm loss: 6.336848E+00 | loss scale: 2048.0 | grad norm: 68729.538 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5819/ 159576 | consumed samples: 167664 | elapsed time per iteration (ms): 16753.1 | learning rate: 4.637E-05 | global batch size: 64 | lm loss: 6.448166E+00 | loss scale: 2048.0 | grad norm: 53348.776 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5820/ 159576 | consumed samples: 167728 | elapsed time per iteration (ms): 16453.7 | learning rate: 4.638E-05 | global batch size: 64 | lm loss: 6.433999E+00 | loss scale: 2048.0 | grad norm: 56781.530 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5821/ 159576 | consumed samples: 167792 | elapsed time per iteration (ms): 16425.7 | learning rate: 4.640E-05 | global batch size: 64 | lm loss: 6.397796E+00 | loss scale: 2048.0 | grad norm: 51600.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5822/ 159576 | consumed samples: 167856 | elapsed time per iteration (ms): 16451.4 | learning rate: 4.642E-05 | global batch size: 64 | lm loss: 6.353134E+00 | loss scale: 2048.0 | grad norm: 49519.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5823/ 159576 | consumed samples: 167920 | elapsed time per iteration (ms): 16634.5 | learning rate: 4.644E-05 | global batch size: 64 | lm loss: 6.402969E+00 | loss scale: 2048.0 | grad norm: 52985.835 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5824/ 159576 | consumed samples: 167984 | elapsed time per iteration (ms): 16465.1 | learning rate: 4.646E-05 | global batch size: 64 | lm loss: 6.411339E+00 | loss scale: 2048.0 | grad norm: 61452.738 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5825/ 159576 | consumed samples: 168048 | elapsed time per iteration (ms): 16470.5 | learning rate: 4.647E-05 | global batch size: 64 | lm loss: 6.406009E+00 | loss scale: 2048.0 | grad norm: 58959.118 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5826/ 159576 | consumed samples: 168112 | elapsed time per iteration (ms): 16992.0 | learning rate: 4.649E-05 | global batch size: 64 | lm loss: 6.401187E+00 | loss scale: 2048.0 | grad norm: 61581.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5827/ 159576 | consumed samples: 168176 | elapsed time per iteration (ms): 16478.9 | learning rate: 4.651E-05 | global batch size: 64 | lm loss: 6.463509E+00 | loss scale: 2048.0 | grad norm: 62591.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 5827 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-25 01:42:45,269] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step5827/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 5827 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 18278.37 [exiting program after 1190.1195526679357 minutes] datetime: 2021-09-25 01:42:58 ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:43:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:44:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:45:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:46:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:47:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:48:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:49:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:50:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:51:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown ERROR:apscheduler.scheduler:Error submitting job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 01:52:20 CEST)" to executor "default" Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/schedulers/base.py", line 979, in _process_jobs executor.submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/base.py", line 71, in submit_job self._do_submit_job(job, run_times) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/apscheduler/executors/pool.py", line 28, in _do_submit_job f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/concurrent/futures/thread.py", line 179, in submit raise RuntimeError('cannot schedule new futures after shutdown') RuntimeError: cannot schedule new futures after shutdown srun: Job step aborted: Waiting up to 62 seconds for job step to finish. Killing subprocess 48057 Killing subprocess 31112 Killing subprocess 48058 Killing subprocess 48059 Killing subprocess 48061 Killing subprocess 31113 Killing subprocess 31114 Killing subprocess 31115 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 77714 Killing subprocess 28453 Killing subprocess 38924 Killing subprocess 54762 Killing subprocess 77715 Killing subprocess 77716 Killing subprocess 77717 Main process received SIGTERM, exiting Killing subprocess 22919 Killing subprocess 28454 Killing subprocess 28455 Killing subprocess 28456 Main process received SIGTERM, exiting Killing subprocess 54763 Killing subprocess 54764 Killing subprocess 54765 Killing subprocess 38925 Killing subprocess 38926 Killing subprocess 38927 Killing subprocess 22920 Main process received SIGTERM, exiting Killing subprocess 22921 Killing subprocess 22923 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 19021 Killing subprocess 45366 Killing subprocess 19022 Killing subprocess 19023 Killing subprocess 19024 Main process received SIGTERM, exiting Killing subprocess 45367 Killing subprocess 45368 Killing subprocess 45370 Main process received SIGTERM, exiting Killing subprocess 48440 Killing subprocess 81370 Killing subprocess 48441 Killing subprocess 48442 Killing subprocess 48443 Main process received SIGTERM, exiting Killing subprocess 81371 Killing subprocess 81372 Killing subprocess 81373 Main process received SIGTERM, exiting Killing subprocess 65810 Killing subprocess 65811 Killing subprocess 65812 Killing subprocess 65813 Main process received SIGTERM, exiting [2021-09-25 02:08:19] PULSE: tr8-104B is waiting to be scheduled (1165978_[1-10%1] on 'gpu_p13' partition) ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninja .................................... [OKAY][OKAY] ninjaninja .................................... [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- ninjaninja .................................... [OKAY] [OKAY]-------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible op name-------------------------------------------------- ................ installed op name.. compatible................ --------------------------------------------------installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ......fused_adam [OKAY]............. [NO] ....... [OKAY] fused_adamfused_adam .......................... [NO][NO] .............. [OKAY][OKAY] fused_adamfused_adam .......................... [NO][NO] ....... .......[OKAY] cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] fused_lamb ............. [NO] ....... [OKAY]fused_adam fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] [OKAY] fused_adamfused_adam .......................... [NO][NO] .............. [OKAY][OKAY] ............. [NO] ....... [OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] sparse_attn fused_lamb............ [NO]............. ....... [NO][OKAY] ....... transformer[OKAY] ............ [NO] ....... [OKAY] transformer transformer............ [NO]............ .......[NO] [OKAY]....... sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] stochastic_transformer . [NO] sparse_attn....... [OKAY]............ [OKAY] transformertransformer ........................ [NO][NO] .............. [OKAY][OKAY] transformer ............transformer [NO]............ .......[NO] [OKAY]....... [NO] ....... [OKAY] stochastic_transformer stochastic_transformer . .[NO] [NO]....... .......[OKAY] [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer stochastic_transformer. [NO] ........ [NO][OKAY] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. cpu_adam[OKAY] ............... --------------------------------------------------[YES] ......op name [OKAY]................ installed .. compatible -------------------------------------------------- fused_adam ............. [NO] ....... cpu_adam[OKAY] ............... [YES] ......fused_lamb [OKAY]............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... sparse_attn[OKAY] ............ [NO] fused_lamb....... .............[OKAY] [NO] .......transformer [OKAY]............ [NO] ....... [OKAY] stochastic_transformer . [NO]sparse_attn ................... [OKAY][NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- sparse_attnop name ............................ installed[NO] ......... compatible [OKAY]-------------------------------------------------- transformer ............ [NO] ....... [OKAY] cpu_adam ...............stochastic_transformer [YES] ....... [OKAY][NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op nameop name ................ op name................ ................ installed installed................ installed....installed compatiblecompatible.. .. -------------------------------------------------- compatiblecompatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam............... cpu_adam ............... ............... [YES] [YES]............... [YES] ............ [YES] ...... [OKAY] [OKAY]...... [OKAY] [OKAY] fused_adamfused_adam fused_adamfused_adam............. .......................................[NO] [NO][NO][NO]....... ....... .............. [OKAY][OKAY] [OKAY][OKAY] fused_lamb fused_lamb.............fused_lambfused_lamb ............. .............[NO] ............. [NO][NO]....... [NO] ....... [OKAY]....... [OKAY]....... [OKAY][OKAY] sparse_attn ............ [NO] .......sparse_attnsparse_attn sparse_attn........................[OKAY] ............[NO][NO] transformer [NO]....... ....... [OKAY]............[OKAY]....... [NO] [OKAY]transformer transformer................... transformer ............[OKAY][NO] [NO]................... [NO][OKAY]stochastic_transformer....... .......[OKAY] . stochastic_transformer[OKAY] [NO] .stochastic_transformer....... stochastic_transformer[NO] [OKAY]........ .[NO][OKAY] .......[NO] [OKAY]....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op nameop name ................op name................................ installed installed ................installed .. .. ..compatibleinstalledcompatible compatible -------------------------------------------------- --------------------------------------------------.. -------------------------------------------------- compatible -------------------------------------------------- cpu_adamcpu_adam ...............cpu_adam............... [YES]............... [YES] ...... [YES] ...... [OKAY] ......cpu_adam [OKAY] [OKAY]............... [YES] ...... [OKAY]fused_adam fused_adam............. .............fused_adam[NO] [NO].................... .......[NO][OKAY] [OKAY]....... [OKAY]fused_lamb .............fused_lamb fused_adam[NO]fused_lamb............. .................................[NO] [OKAY][NO] ....... [NO] .......[OKAY] .......[OKAY] [OKAY] fused_lambsparse_attn .........................sparse_attn [NO][NO] ................... sparse_attn[NO] ....... [OKAY] ...................[OKAY] [OKAY]transformer[NO] ................... transformer[NO][OKAY] ............ ....... [NO][OKAY] .......transformer [OKAY]............ stochastic_transformer[NO] ........ stochastic_transformersparse_attn [OKAY] [NO] .................... [NO][OKAY]stochastic_transformer[NO] ....... .......[OKAY]. [OKAY][NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninjaninja ninja ...................................................... ..................[OKAY][OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................................................................ installed installedinstalledinstalled .. .... .. compatiblecompatiblecompatible compatible -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cpu_adamcpu_adam cpu_adam ............... ............... ............... [YES] [YES] [YES] ...... ...... ...... [OKAY] [OKAY] [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adamfused_adam .............fused_adam............. [NO].............[NO] .......[NO]....... [OKAY].......[OKAY] [OKAY] fused_lambfused_lamb .............fused_lamb............. [NO].............[NO]fused_adam [NO]........................... [OKAY] ....... [OKAY] [OKAY] [NO] ....... [OKAY] fused_lambsparse_attn sparse_attn............ ............[NO]sparse_attn .......[NO]......................... [NO][NO].......[OKAY] ..............[OKAY] transformer [OKAY] [OKAY]............transformer [NO]............ transformer ....... [NO] ............[OKAY]....... [NO][OKAY] .......stochastic_transformer [OKAY] stochastic_transformer. [NO] .stochastic_transformer....... [NO][OKAY] ........ [OKAY][NO] sparse_attn....... [OKAY] ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY] [OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op nameop name op name................................................ installed installed................ installed .. installed.. .. ..compatible compatible compatible--------------------------------------------------compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam...............cpu_adam cpu_adam...............[YES] ............... ...............[YES]......[YES] ......[YES][OKAY]...... [OKAY][OKAY] ...... [OKAY] fused_adam fused_adam.............fused_adam [NO]............. fused_adam ....... .............[NO].............[OKAY] [NO].......[NO] .......[OKAY]....... fused_lamb [OKAY][OKAY] ............. [NO]fused_lamb fused_lambfused_lamb....... ..........................[OKAY] ............. [NO] [NO] [NO] .............. .......[OKAY][OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attnsparse_attnsparse_attn transformer .................................... [NO][NO]............[NO] ..............[NO]....... [OKAY].......[OKAY] [OKAY] transformertransformer[OKAY] ............transformer............ [NO]............[NO]stochastic_transformer ..............[NO] [OKAY]. [OKAY] .......[NO] stochastic_transformer[OKAY]....... stochastic_transformer [OKAY]. .stochastic_transformer[NO] [NO]........ ....... [NO] [OKAY] [OKAY] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ninjaninjaninjaninja ...................................................... ..................[OKAY][OKAY][OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------op nameop nameop name ................................................ installedop name installed ..installed................ .. .. installedcompatible compatiblecompatible --------------------------------------------------..---------------------------------------------------------------------------------------------------- compatible -------------------------------------------------- cpu_adam cpu_adam............... cpu_adam cpu_adam ...............[YES] ..................... ............... [YES] [YES][YES][OKAY] ...... ............ [OKAY][OKAY][OKAY] fused_adam ............. [NO]fused_adam fused_adam....... fused_adam ..........................[OKAY] .............[NO] [NO] [NO].......fused_lamb [OKAY]........................... [NO][OKAY][OKAY] fused_lamb ....... fused_lamb.............[OKAY] fused_lamb [NO] ................................. [NO][OKAY][NO] .............. [OKAY][OKAY] sparse_attn ............ [NO] .......sparse_attn [OKAY]............sparse_attn sparse_attn[NO]............transformer ....... [NO]............ ............ [OKAY] .......[NO] [NO] [OKAY] ....... transformer....... transformer............[OKAY] [NO]............[OKAY] ....... stochastic_transformer[OKAY][NO]transformer .................... [NO]stochastic_transformer[NO] [OKAY] ............... [NO][OKAY] [OKAY]stochastic_transformer ....... [OKAY]. stochastic_transformer [NO] ........ [OKAY][NO] ....... [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op nameop name-------------------------------------------------- op name ................ ................ installed................installed op name .... installed ................compatible compatible .. installed ---------------------------------------------------------------------------------------------------- compatible .. compatible-------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam .............................. [YES][YES] cpu_adam cpu_adam...... ...... ...............[OKAY][OKAY]............... [YES] [YES]...... ......[OKAY] [OKAY] fused_adam ............. [NO]fused_adam .................... fused_adam[NO][OKAY] fused_adam ....... ..........................[OKAY]fused_lamb [NO][NO]............. fused_lamb .......[NO] .............. ............. [OKAY][OKAY][OKAY] [NO] ....... [OKAY]fused_lamb fused_lamb .......................... [NO]sparse_attn[NO] .......................... [OKAY][NO]sparse_attn[OKAY] ................... [NO][OKAY] ....... [OKAY]transformer ............ transformer[NO] sparse_attn...................sparse_attn [OKAY][NO]........................ .......[NO][NO] [OKAY]stochastic_transformer.............. [OKAY].[OKAY] stochastic_transformer [NO]transformer transformer. ....... ............ [NO]............ [OKAY] [NO] [NO] ....... ..............[OKAY] [OKAY][OKAY] stochastic_transformerstochastic_transformer . .[NO] .......[NO] [OKAY]....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninja ninja .................. .................. .................................... [OKAY][OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name op name op nameop name................ ................ ................ installedinstalled................ installed.. .. installed compatible compatible.. .. -------------------------------------------------- --------------------------------------------------compatible compatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES]cpu_adam cpu_adam..................... cpu_adam[YES]...............[OKAY] ............... ...... [YES] ......[YES] [OKAY] [OKAY]fused_adam...... .............[OKAY] [NO] ....... [OKAY] fused_lambfused_adamfused_adam fused_adam ....................................... ............. [NO] [NO][NO][NO]....... .............. .......[OKAY][OKAY] [OKAY][OKAY] fused_lambfused_lamb fused_lamb.......................... .............[NO][NO] .......[NO]....... sparse_attn[OKAY].......[OKAY] [OKAY]............ [NO] ....... [OKAY] transformer ............ [NO]sparse_attn sparse_attn ....... sparse_attn............ ............ [OKAY] [NO]............ [NO] ..............[NO] stochastic_transformer [OKAY] [OKAY] ....... . [OKAY]transformer[NO] transformer ................... [NO]transformer............ [OKAY] ....... ............ [NO] [OKAY] [NO] ....... .......[OKAY] [OKAY]stochastic_transformer stochastic_transformer. stochastic_transformer [NO]. ........[NO] [OKAY][NO]....... .......[OKAY] [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. ......................................................[OKAY] [OKAY] [OKAY]-------------------------------------------------- [OKAY] ---------------------------------------------------------------------------------------------------- op name-------------------------------------------------- op name op name................ op name ................ ................installed ................ installed.. installed ..installed compatible .. .. compatible-------------------------------------------------- compatible compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES] cpu_adam......cpu_adamcpu_adam ...............[OKAY]............... ............... [YES] [YES][YES]...... ............[OKAY] [OKAY][OKAY]fused_adam ............. [NO] ....... [OKAY] fused_adamfused_lamb .......................... fused_adam[NO][NO]fused_adam ........................... ............. [NO] [OKAY] [NO] [OKAY] .............. [OKAY][OKAY] fused_lamb ............. fused_lambfused_lamb[NO] .............sparse_attn.................... [NO] ............[OKAY] [NO]....... [NO] [OKAY].............. [OKAY][OKAY] transformer sparse_attn............ ............[NO] [NO]....... .......[OKAY] sparse_attn[OKAY] sparse_attn............ transformerstochastic_transformer............[NO] ...................[NO]. [NO].......[NO][OKAY] .......[OKAY]....... [OKAY][OKAY] transformertransformer ........................stochastic_transformer [NO] [NO]....... . ....... [OKAY] [NO] [OKAY] ....... [OKAY] stochastic_transformer stochastic_transformer .. [NO][NO] ....... .......[OKAY] [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop name op name ................ ................................ installed................installed installed installed.. .. .. compatible.. compatible compatible compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam............... cpu_adam ...............[YES] ............... ............... ...... [YES][YES] [YES]...... [OKAY] ............[OKAY] [OKAY][OKAY] fused_adam fused_adamfused_adam.............fused_adam ..........................[NO]............. [NO] ....... [NO][NO] ....... [OKAY][OKAY].............. [OKAY][OKAY] fused_lamb fused_lamb ..........................fused_lambfused_lamb [NO][NO].......................... [NO]....... .......[NO] ....... [OKAY] [OKAY]....... [OKAY] [OKAY] sparse_attn sparse_attn............ sparse_attnsparse_attn ............ [NO]............ [NO]...................[NO] [OKAY].......[NO]....... [OKAY][OKAY]....... transformer transformer[OKAY]transformer............ ........................[NO] transformer[NO] [NO] .......................... .......[NO][OKAY] [OKAY] [OKAY] ....... [OKAY]stochastic_transformer stochastic_transformer stochastic_transformer ..stochastic_transformer . [NO][NO] . [NO][NO].............. ..............[OKAY][OKAY] [OKAY][OKAY] ninjaninjaninjaninja ...................................................... [OKAY] [OKAY].................. [OKAY] ---------------------------------------------------------------------------------------------------- [OKAY] -------------------------------------------------- op name op name--------------------------------------------------op name ................ ................................op name installedinstalled................installed ....installed.. compatiblecompatiblecompatible .. -------------------------------------------------- --------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... cpu_adam ............... [YES] cpu_adam[YES]............... ...... ...... ............... [OKAY] [YES][OKAY][YES] ...... ......[OKAY] [OKAY] fused_adam fused_adam............. .............[NO] [NO]fused_adamfused_adam....... ....................[OKAY]............. [OKAY][NO] [NO]fused_lamb....... ....................fused_lamb [OKAY][NO] ....................fused_lamb[OKAY] [OKAY].............[NO] fused_lamb[NO]....... .............[OKAY]....... [NO][OKAY] ....... sparse_attn[OKAY] ............ [NO] .......sparse_attn [OKAY]............ [NO]sparse_attn transformer ....... ........................[OKAY] sparse_attn [NO] [NO] .......transformer....... ........................ [OKAY][OKAY][NO][NO] transformer.............. stochastic_transformer............[OKAY][OKAY] .[NO] stochastic_transformertransformer[NO]....... ....................[OKAY] [OKAY][NO] [NO] stochastic_transformer.............. [OKAY][OKAY]. [NO] ....... stochastic_transformer[OKAY] . [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................. .................. .................. [OKAY][OKAY][OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- op nameop name --------------------------------------------------................op name installedop name................ ................ .. ................ installedinstalledcompatible ..installed.. -------------------------------------------------- compatible compatible --------------------------------------------------.. --------------------------------------------------compatible -------------------------------------------------- cpu_adam cpu_adamcpu_adam............... cpu_adam...............[YES] ............... ............... [YES]......[YES] [YES]............[OKAY] ......[OKAY][OKAY] [OKAY] fused_adam ............. [NO] fused_adam.......fused_adamfused_adam .............[OKAY].......................... [NO] [NO] [NO]fused_lamb ....... .............[OKAY].............. [NO][OKAY] [OKAY] fused_lamb .................... fused_lamb [OKAY]fused_lamb [NO] ............. ............. ....... [NO] [NO] [OKAY] .............. [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn sparse_attntransformer............sparse_attn ............[NO]............ ............[NO] .......[NO]....... [NO] [OKAY] ....... [OKAY]....... [OKAY][OKAY] transformer transformerstochastic_transformer ............transformer ............ [NO][NO] .................... [NO].......[OKAY][NO] ....... .......[OKAY][OKAY] [OKAY] stochastic_transformer .stochastic_transformerstochastic_transformer [NO] ......... [NO][NO][OKAY] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name op name op nameop name ................ ................................................installed installedinstalled.. installed ....compatible.. compatible compatible-------------------------------------------------- compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam ............... cpu_adam ...............[YES]............... .....................[YES][YES] [OKAY]......[YES] ...... ...... [OKAY] [OKAY] [OKAY] fused_adam ............. [NO] ....... [OKAY]fused_adam fused_adamfused_adam fused_lamb....................................... ............. [NO] [NO][NO] [NO] ............................ [OKAY] [OKAY][OKAY][OKAY] fused_lambfused_lamb fused_lamb.......................... .............[NO][NO] [NO].............. .......[OKAY][OKAY]sparse_attn [OKAY]............ [NO] ....... [OKAY] transformersparse_attn sparse_attn ............ ............sparse_attn ............ [NO] [NO][NO]................... ....... ....... [NO][OKAY][OKAY] .......[OKAY] [OKAY]stochastic_transformer transformer .............transformer [NO] [NO]transformer................... ...................[NO][OKAY] .......[NO][OKAY] stochastic_transformer [OKAY]....... [OKAY]. stochastic_transformer[NO] .......stochastic_transformer . [OKAY][NO] ........ [NO][OKAY] ....... [OKAY] -------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................ ................................ ................installedinstalled installedinstalled.. .. .. .. compatiblecompatible compatible compatible --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. cpu_adamcpu_adamcpu_adamcpu_adam ............................................. ...............[YES] [YES][YES][YES]...... ..................[OKAY] [OKAY][OKAY][OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] fused_adam ............. fused_adam[NO]fused_adam fused_adam ............. ............. .......[NO] .............[NO] ....... [OKAY] [NO]....... [OKAY] utils .................. [YES] ...... [OKAY] .......[OKAY] fused_lamb [OKAY] fused_lamb quantizer .............. [NO] ....... [OKAY] ............. fused_lamb ............. fused_lamb.............[NO] [NO][NO].................... ....... .......[OKAY][NO] -------------------------------------------------- [OKAY][OKAY]....... [OKAY] sparse_attn ............ sparse_attn[NO] sparse_attn.......sparse_attn............ ............[NO][OKAY] ................... [NO][NO]transformer[OKAY] .......................... transformer[OKAY][OKAY][NO] .......transformer ............ [OKAY]............[NO]transformer [NO]................... ....... stochastic_transformer [OKAY] [NO][OKAY] ........ stochastic_transformer[OKAY][NO] stochastic_transformer ........ .[NO][OKAY] stochastic_transformer [NO] ....... ........[OKAY] [NO][OKAY] ....... [OKAY] ninjaninjaninjaninja .................................... .................. ..................[OKAY][OKAY] [OKAY] --------------------------------------------------[OKAY]-------------------------------------------------- -------------------------------------------------- op nameop name ................--------------------------------------------------op name ................installed................ op nameinstalled.. installed ................ .. .. compatibleinstalled compatiblecompatible ..---------------------------------------------------------------------------------------------------- -------------------------------------------------- compatible -------------------------------------------------- cpu_adamcpu_adam cpu_adam cpu_adam............... ............... .............................. [YES] [YES][YES][YES]...... ............ ......[OKAY][OKAY] [OKAY] [OKAY] fused_adamfused_adam fused_adamfused_adam............. .............[NO] .............[NO].................... [OKAY][NO]....... [NO] ....... [OKAY]....... [OKAY]fused_lamb[OKAY] .............fused_lamb fused_lamb[NO] fused_lamb ................................. .............[OKAY][NO][NO] [NO].............. .......[OKAY][OKAY] [OKAY] sparse_attn ............ [NO] sparse_attnsparse_attn.......sparse_attn ........................[OKAY]............ [NO][NO][NO] .....................transformer [OKAY][OKAY][OKAY] ............ transformer[NO]transformertransformer ........................................... [NO] [NO][OKAY][NO] ....... ....... ....... [OKAY] stochastic_transformer[OKAY] [OKAY] . stochastic_transformer[NO]stochastic_transformerstochastic_transformer ....... ..[OKAY]. [NO][NO] [NO]....... ....... ....... [OKAY] [OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system transformer_inference .. [NO] ....... [OKAY] meet the required dependencies to JIT install the op.JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................. ...................................................... [OKAY][OKAY] [OKAY] [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op name async_io ............... [NO] ....... [NO] op name op name ................op name ................ ................ ................installedinstalled installed ..installed .. .. .. compatible compatiblecompatible compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] cpu_adam cpu_adamcpu_adam............... cpu_adam ............... ..............................[YES] [YES] [YES]......[YES] ............[OKAY] ...... [OKAY] [OKAY] quantizer .............. [NO] ....... [OKAY] [OKAY] -------------------------------------------------- fused_adamfused_adam .............fused_adam............. fused_adam [NO].............[NO] .......[NO] .................... [OKAY]....... [NO][OKAY][OKAY] fused_lamb ....... fused_lamb.............[OKAY] fused_lamb .............[NO] [NO]....... .............fused_lamb....... [NO] [OKAY] [OKAY]............. ....... [NO][OKAY] ....... [OKAY] sparse_attnsparse_attn ........................sparse_attn sparse_attn[NO][NO]............ ....... ................... [OKAY][NO][OKAY] [NO] transformer.............. ............ transformer[OKAY] [OKAY] [NO]............ .......[NO]transformertransformer [OKAY]............................... [NO][OKAY] [NO]stochastic_transformer ....... .......stochastic_transformer.[OKAY] [NO][OKAY]. ....... [NO]stochastic_transformer[OKAY]stochastic_transformer ....... .[OKAY]. [NO][NO] .............. [OKAY][OKAY] ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name -------------------------------------------------- op name op name................ op name................ ................installed installedinstalled.................. ..compatibleinstalled.. compatiblecompatible--------------------------------------------------.. ----------------------------------------------------------------------------------------------------compatible -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam [YES]cpu_adam ............... .....................[YES]............... [YES] [YES] [OKAY]...... ............ [OKAY] [OKAY] [OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY] fused_adam .............fused_adam [NO]............. fused_lamb....... ............. [NO] .............[OKAY] [NO] ....... [NO] ....... fused_lamb .......[OKAY] [OKAY] ............. [OKAY] [NO] .......fused_lamb fused_lamb[OKAY] .......................... [NO][NO] .............. [OKAY][OKAY]sparse_attn ............ [NO] .......sparse_attn [OKAY]............ [NO] ....... sparse_attnsparse_attntransformer[OKAY] .................................... [NO] transformer[NO] [NO]................... ....... .......[NO] .......[OKAY][OKAY] [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformerstochastic_transformer transformer............stochastic_transformer. ............ [NO]. [NO] [NO] [NO]....... ....... ..............[OKAY][OKAY] [OKAY][OKAY] async_io ............... [NO] ....... [NO] stochastic_transformer stochastic_transformer. [NO]. .......[NO] [OKAY]....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name op name op name ................................................ installed ................installed installed installed ...... ..compatiblecompatiblecompatible --------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam...............cpu_adam ...............cpu_adam............... [YES] [YES] ............... [YES]...... ...... [YES]......[OKAY][OKAY] ...... [OKAY] [OKAY] fused_adam ............. fused_adam[NO] fused_adam............. fused_adam ....... .............[NO] ............. .......[OKAY][NO][NO] [OKAY].......fused_lamb....... [OKAY][OKAY].............fused_lamb [NO].............fused_lambfused_lamb [NO].................... .............[NO] .......[NO][OKAY] ....... [OKAY] ....... [OKAY] [OKAY] sparse_attn sparse_attn............sparse_attn sparse_attn........................[NO] ............[NO] [NO] .......[NO] ....... ....... .......[OKAY] [OKAY] [OKAY] [OKAY] transformer ............transformertransformer transformer ........................[NO]............ [NO] [NO] ....... [NO]....... ....... [OKAY]....... [OKAY] [OKAY][OKAY] stochastic_transformer stochastic_transformerstochastic_transformer .stochastic_transformer . [NO] . [NO]. ....... [NO][NO][OKAY]....... ..............[OKAY] [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------op name op name................ op name................op name installed installed.................. ................ .. compatibleinstalled installed compatible..-------------------------------------------------- ..--------------------------------------------------compatible compatible -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam .............................. [YES]cpu_adam[YES] cpu_adam ........................... [OKAY]...............[OKAY] [YES][YES] ............ [OKAY][OKAY] fused_adam fused_adam............. .............[NO] [NO]....... fused_adamfused_adam.......[OKAY] [OKAY]............. .............[NO] [NO]fused_lambfused_lamb....... ............. ............. [OKAY]....... [NO] [NO][OKAY]....... fused_lamb.......[OKAY] [OKAY]fused_lamb............. .............[NO] [NO]....... .......[OKAY] [OKAY] sparse_attn sparse_attn............ ............[NO] [NO]....... sparse_attn.......sparse_attn[OKAY] ............[OKAY]............ transformer[NO][NO] transformer................... ................... [NO][OKAY] [OKAY] [NO] ....... transformer transformer [OKAY]....... ............ ............[OKAY][NO] stochastic_transformer [NO] ....... stochastic_transformer ........ [OKAY] .[NO] [OKAY][NO] .......stochastic_transformer ....... [OKAY] [OKAY]stochastic_transformer. [NO] ........ [NO][OKAY] ....... [OKAY] -------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name................ ................ ................ ................installed installedinstalled installed .... .. ..compatiblecompatible compatiblecompatible---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam cpu_adamcpu_adam[YES] ............... .............................. ...... [YES][YES][OKAY] [YES]...... ...... [OKAY] ...... [OKAY] [OKAY] fused_adam ............. [NO]fused_adam fused_adam.................... fused_adam .............[NO][OKAY] .............[NO]....... [NO] fused_lamb.......[OKAY] ............. ....... [OKAY] [NO] fused_lamb [OKAY]fused_lamb ............. ....... [NO].............[OKAY] .......fused_lamb[NO] [OKAY].................... [NO][OKAY] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attnsparse_attntransformer sparse_attn.................................... [NO][NO]............ [NO] .......[NO]....... .......[OKAY][OKAY]....... [OKAY][OKAY] transformerstochastic_transformer transformer transformer............ .........................[NO] [NO] [NO] .......[NO] ....... [OKAY].............. [OKAY][OKAY] [OKAY] stochastic_transformer stochastic_transformer stochastic_transformer. ..[NO] [NO][NO]....... ..............[OKAY] [OKAY][OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................................... ....................................[OKAY][OKAY] [OKAY]--------------------------------------------------[OKAY]-------------------------------------------------- op name --------------------------------------------------................ --------------------------------------------------op name op nameinstalled .................................. op name installedcompatible installed ..................-------------------------------------------------- installed..compatible ..compatible -------------------------------------------------- compatible --------------------------------------------------cpu_adam -------------------------------------------------- ...............cpu_adam [YES]............... ......[YES] cpu_adam [OKAY]cpu_adam ...... -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system ...............[OKAY]............... [YES][YES] ............ [OKAY][OKAY] meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja fused_adam ............. [NO] fused_adam....... .............[OKAY] [NO]fused_adamfused_adam fused_lamb ....... .......................... ............. [OKAY][NO][NO][NO] ..............fused_lamb ....... [OKAY] [OKAY] .............[OKAY] [NO] fused_lamb .................... fused_lamb[OKAY] .............[NO]sparse_attn [NO]................... .......[NO][OKAY] .......[OKAY] [OKAY]sparse_attn ............ [NO] transformer....... ............[OKAY] [NO] sparse_attn.......sparse_attn transformer[OKAY] ............ ........................ [NO][NO]stochastic_transformer[NO] ............... ....... [OKAY][OKAY][OKAY][NO] .......stochastic_transformer transformer [OKAY] transformer ............. ............[NO][NO] [NO].............. .......[OKAY][OKAY] [OKAY] stochastic_transformer stochastic_transformer . [NO]. .......[NO] [OKAY]....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op name op nameop name................ ................ ................ ................installed installed ..installed installed ..compatible ....--------------------------------------------------compatible compatible compatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... cpu_adam............... cpu_adam[OKAY] [YES].............................. ......[YES][YES] ......[OKAY]...... fused_adam [OKAY][OKAY]............. [NO] ....... [OKAY] fused_adam .............fused_lamb [NO]fused_adam............. fused_adam .......[NO] ............. .................... [OKAY] [NO] [NO] [OKAY] ..............fused_lamb [OKAY][OKAY]............. [NO] fused_lamb.......fused_lambsparse_attn .............[OKAY]......................... [NO][NO][NO] ..................... [OKAY][OKAY][OKAY] sparse_attn ............transformer [NO]............ .......[NO] [OKAY]....... [OKAY] sparse_attnsparse_attn transformerstochastic_transformer............ ........................ [NO] . [NO][NO] .............. [NO] .......[OKAY] [OKAY] [OKAY]....... [OKAY]stochastic_transformer transformer transformer ......................... [NO][NO][NO] ....... [OKAY] ninjaninjaninjaninja .................. ....................................[OKAY].................. .............. stochastic_transformer[OKAY][OKAY] [OKAY][OKAY]--------------------------------------------------[OKAY] --------------------------------------------------op name-------------------------------------------------- -------------------------------------------------- op name . [NO]stochastic_transformer ....... .[OKAY] [NO] ....... [OKAY] ................ ................op nameop nameinstalled ................ installed .................. installed .. compatible installed.. compatible --------------------------------------------------compatible--------------------------------------------------.. --------------------------------------------------compatible -------------------------------------------------- cpu_adamcpu_adam .............................. cpu_adam[YES][YES] cpu_adam ..................... ...... ............... [OKAY][YES] [OKAY] ......[YES] [OKAY]...... [OKAY] fused_adamfused_adam .......................... [NO][NO] fused_adam.............. fused_adam............. [OKAY][OKAY].............[NO] .......[NO]fused_lamb [OKAY]fused_lamb.................... [NO]............. fused_lamb[OKAY].......[NO] [OKAY]............. ....... [NO][OKAY] .......fused_lamb [OKAY] ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY][OKAY][OKAY] ............. [NO] ....... [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop name sparse_attn ............sparse_attn [NO]............ .......[NO] sparse_attn [OKAY] ....... ............ [OKAY][NO]transformer op name ................................ op name................ installed installed..installed ................ .. compatible.. installed compatible -------------------------------------------------- sparse_attn...................transformer [OKAY]............[NO]............ .......[NO] [NO] transformer[OKAY] ....... ..compatible -------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- ....... ............ [OKAY] stochastic_transformer [OKAY][NO] cpu_adam cpu_adam............... ...............cpu_adam[YES] [YES]......cpu_adam............... [OKAY] ......[YES] ........ stochastic_transformer transformer [NO][OKAY] .................... stochastic_transformer [NO][NO] [OKAY] ....... ............... [OKAY] ...... .[OKAY] ....... [NO] [OKAY]....... [OKAY] [YES] [OKAY] ...... [OKAY]fused_adam ............. [NO]fused_adam .................... fused_adam [OKAY] [NO] stochastic_transformer . [NO] ....... [OKAY] ............. .......[NO]fused_lamb fused_adam [OKAY]............. ....... [NO].............[OKAY] fused_lamb....... .............[NO]fused_lamb[OKAY] [NO] .................... .......[NO][OKAY] [OKAY] ....... [OKAY] sparse_attnfused_lamb ......................... [NO] [NO]....... .......sparse_attnsparse_attn[OKAY] ........................[OKAY] [NO][NO] transformer ....... ....... ............ [OKAY][OKAY][NO] ....... transformer[OKAY] transformer ............ sparse_attn............[NO]stochastic_transformer .......[NO] ............[OKAY]. .......[NO] [NO] [OKAY]stochastic_transformer ....... ........[OKAY] stochastic_transformer[OKAY][NO] ........ [NO][OKAY]transformer ................... [OKAY] [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name op name op nameop name ................ ................................installed ................ installed installedinstalled.. ..compatible ....compatible-------------------------------------------------- compatiblecompatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES]cpu_adam ...............cpu_adam...... cpu_adam ............... [YES][OKAY] ............... [YES] ...... [YES] ...... [OKAY] ...... [OKAY] [OKAY] fused_adam ............. [NO] ....... [OKAY]fused_adamfused_adam fused_adam .............fused_lamb.......................... [NO].............[NO] [NO] ....... .......[NO]....... [OKAY].......[OKAY][OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_lamb fused_lambfused_lamb............. .............[NO]............. [NO] ....... [NO] ....... [OKAY]sparse_attn .......[OKAY]............ [OKAY][NO] async_io ............... [NO] ....... [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] transformersparse_attn ........................ sparse_attn [NO][NO]sparse_attn ...................................... [OKAY][OKAY][NO][NO] .............. transformer[OKAY][OKAY]stochastic_transformer utils .................. [YES] ...... [OKAY] ............. transformer[NO] transformer[NO]................... ................... [OKAY][NO] [OKAY] [NO] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- .......stochastic_transformer .......[OKAY]. [OKAY][NO] stochastic_transformer....... [OKAY]stochastic_transformer. [NO] ........ [NO][OKAY] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name op name op name ................................ ................ ................ installedinstalled installed..installed.. compatible.... compatible --------------------------------------------------compatiblecompatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ............... cpu_adamcpu_adam[YES] cpu_adam............... ............... ......[YES] ............... [OKAY][YES] ...... [YES]...... [OKAY]......[OKAY] [OKAY] fused_adam ............. [NO] fused_adamfused_adam.......fused_adam .............[OKAY].......................... [NO][NO][NO]fused_lamb .................................. [NO][OKAY][OKAY] [OKAY]....... fused_lamb[OKAY]fused_lambfused_lamb ....................................... [NO] [NO] [NO] ....... ....... ....... [OKAY] [OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attnsparse_attn transformer sparse_attn............ ........................ ............[NO] [NO] [NO][NO]....... .....................[OKAY] [OKAY][OKAY] [OKAY]transformer transformer............ ............[NO] stochastic_transformertransformer [NO] ....... ............ ........ [OKAY] [NO][NO][OKAY] .............. stochastic_transformer [OKAY] [OKAY] stochastic_transformer . [NO]. stochastic_transformer....... [NO][OKAY] ........ [NO][OKAY] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja async_io ............... [NO] ....... [NO] -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report transformer_inference .. [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op name................op name op name ................ installed ................................installed .. installed installed ..compatible .. compatible ..-------------------------------------------------- compatible--------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam ...............cpu_adamcpu_adam............... [YES] .............................. [YES] ......[YES]......[YES] [OKAY] [OKAY]............ [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_adam .............fused_adam [NO].............fused_adamfused_adam [NO].................... [NO]............. .......[OKAY][NO]....... [OKAY].......[OKAY] fused_lamb async_io ............... [NO] ....... [NO] fused_lamb[OKAY].............fused_lamb .............[NO]............. [NO] fused_lamb .......[NO]....... [OKAY].............[OKAY]....... [NO][OKAY] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] sparse_attn sparse_attn............ sparse_attn............[NO] [NO]...................sparse_attn .......[NO][OKAY]............ [NO] [OKAY]....... quantizer .............. [NO] ....... [OKAY] ....... transformer [OKAY] [OKAY]............ -------------------------------------------------- transformer [NO]............ transformertransformer ....... [NO] ............ ............[OKAY] ....... [NO] [NO] [OKAY]stochastic_transformer ....... ........[OKAY] stochastic_transformer [OKAY] [NO] ........ stochastic_transformer[OKAY][NO]stochastic_transformer ......... [OKAY][NO] [NO] .............. [OKAY] [OKAY] ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. op name................op name op name................installed................ ..installed................installed ..compatibleinstalled.. compatible..compatible-------------------------------------------------- async_ioasync_io .............................. [NO][NO] .............. [NO][NO] compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] cpu_adam ............... [YES] ...... cpu_adamcpu_adam[OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] cpu_adam ............... ............... ............... [YES] [YES] [YES] ...... ...... ......fused_adam [OKAY] [OKAY] ............. [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- fused_lamb fused_adam.............fused_adam fused_adam[NO]............. .................................[NO] .......[NO][OKAY][NO] ..............[OKAY] [OKAY][OKAY] fused_lamb ............. fused_lambfused_lamb[NO] sparse_attn ............. ................................ [NO][NO][OKAY][NO] ....... ..............[OKAY] [OKAY][OKAY] transformer ............ sparse_attn[NO] ....... ............[OKAY] sparse_attnsparse_attn [NO] ............stochastic_transformer....... ............[NO] . [NO] [OKAY] [NO]....... ....... ....... [OKAY][OKAY]transformer[OKAY] ............transformer transformer............[NO] ............[NO] ....... [NO] ....... [OKAY].......[OKAY] [OKAY] stochastic_transformer stochastic_transformerstochastic_transformer. [NO]. ........[NO] [NO].......[OKAY] .......[OKAY] [OKAY] ninjaninjaninja ninja .................. .................................... .................. [OKAY][OKAY] [OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- --------------------------------------------------op name--------------------------------------------------op name op name................op name................ installed................ ................ installedinstalled .. .. installed..compatible compatible .. --------------------------------------------------compatible -------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES]cpu_adamcpu_adam cpu_adam ...... ............... .............................. [OKAY][YES] [YES] [YES]............ [OKAY]......[OKAY] [OKAY] fused_adam ............. fused_adam[NO] fused_adamfused_adam.................... ..........................[OKAY] [NO] [NO][NO]....... ..............[OKAY]fused_lamb [OKAY][OKAY]............. fused_lamb[NO] fused_lamb ............. ....... ............. fused_lamb[OKAY][NO] [NO] ............. ....... ....... [NO] [OKAY] [OKAY] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn transformersparse_attn ............sparse_attn ............ [NO]............ ............ [NO]....... [NO] [NO][OKAY] ....... ..............[OKAY] [OKAY][OKAY]stochastic_transformer transformer ............transformer. [NO]transformer [NO]........................ ....... .......[NO] [NO] [OKAY][OKAY] ....... .......[OKAY] [OKAY]stochastic_transformer stochastic_transformer . [NO]stochastic_transformer. .......[NO]. [OKAY][NO]....... .......[OKAY] [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja JIT compiled ops requires ninja--------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................. ....................................[OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------op name op nameop name ................op name................ installed................................ installed..installed installed .. compatible.. .. compatible --------------------------------------------------compatible compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES]cpu_adamcpu_adam cpu_adam............... ...... ............... ...............[YES] [OKAY] [YES] ...... [YES]...... [OKAY]......[OKAY] [OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY]fused_adam fused_adam............. .............[NO]fused_lamb............. .......[NO].............[NO] [NO].......[OKAY] ....... ....... fused_lamb[OKAY][OKAY][OKAY] ............. [NO]fused_lambfused_lamb ....... ............. ............. [OKAY] [NO][NO]sparse_attn .......................... [OKAY][OKAY][NO] .......sparse_attn [OKAY]............ [NO] transformer....... ............[OKAY] sparse_attn[NO]sparse_attn transformer ........................................... [OKAY] [NO] [NO][NO] ..................... stochastic_transformer [OKAY] [OKAY][OKAY] . [NO]transformer stochastic_transformertransformer ................................ [OKAY][NO][NO] [NO]....... ..............[OKAY] [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop name op name ................................................................ installedinstalledinstalled installed ........ compatiblecompatiblecompatible compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adamcpu_adam[YES]cpu_adam .................................... [OKAY]............... [YES][YES][YES] .................. [OKAY][OKAY]fused_adam[OKAY] ............. [NO] ....... [OKAY] fused_adamfused_lamb fused_adam fused_adam.......................... [NO] ................................. [NO] [NO][NO][OKAY] ....... ....... ....... [OKAY] [OKAY] [OKAY] fused_lambfused_lambfused_lamb .......................... sparse_attn............. [NO] [NO][NO] ............ .............. .......[NO][OKAY] [OKAY][OKAY]....... [OKAY] transformer ............ [NO]sparse_attn ................... sparse_attn[NO] sparse_attn[OKAY] ....... ............ ............ [OKAY] [NO] [NO]stochastic_transformer transformer .......................... . [NO][OKAY][OKAY][NO] ....... ....... [OKAY]transformer[OKAY] transformer ........................ stochastic_transformer [NO] [NO] ............... [NO][OKAY] [OKAY] ....... [OKAY] stochastic_transformer stochastic_transformer . . [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------op nameop name op name ................ ................op name ................ installed................ installed ..installed installed .. compatible ....compatible compatible--------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES] cpu_adam cpu_adam ..................... ............... ............... [OKAY][YES] [YES] [YES] ...... ...... ...... [OKAY] [OKAY] [OKAY] fused_adam ............. [NO] ....... [OKAY]fused_adam fused_adam fused_adam ............. fused_lamb............. ............. .............[NO][NO][NO] .............. [NO] .......[OKAY].......[OKAY] [OKAY][OKAY] fused_lamb fused_lamb.............fused_lamb [NO].......................... .......[NO][NO] [OKAY].......sparse_attn....... [OKAY]............[OKAY] [NO] ....... [OKAY] transformer ............ sparse_attn[NO] sparse_attn sparse_attn............ ....... ........................[NO][OKAY] [NO].......[NO] [OKAY].......stochastic_transformer....... [OKAY][OKAY]transformer . ............transformer[NO] transformer ................... [NO]............ [NO] [OKAY]....... [NO] ....... [OKAY] ....... [OKAY] [OKAY] stochastic_transformer .stochastic_transformerstochastic_transformer [NO] ......... [NO][NO][OKAY] .............. [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatibleninja ---------------------------------------------------------------------------------------------------- ..................ninja [OKAY] .................. [OKAY]cpu_adam-------------------------------------------------- cpu_adam ............... -------------------------------------------------- op name............... [YES] ................ op name[YES] ...... installed ......................[OKAY] [OKAY]..installed ..compatible compatible -------------------------------------------------- --------------------------------------------------fused_adam ............. fused_adam[NO] .................... [OKAY][NO] cpu_adam....... ...............cpu_adam[OKAY]fused_lamb [YES] ............................ ...... [YES]fused_lamb[NO] [OKAY]............. .............[OKAY][OKAY] [NO] ....... [OKAY] fused_adam ............. [NO] .......fused_adam [OKAY].............sparse_attn [NO]............sparse_attn fused_lamb ....... ............[NO] ............. [OKAY] .......[NO] [NO] [OKAY]..............fused_lamb [OKAY].............[OKAY]transformer [NO] ...................transformer [NO][OKAY]............ .......[NO] [OKAY].......sparse_attn [OKAY]............ [NO]stochastic_transformer .......stochastic_transformer . [OKAY]sparse_attn .[NO]............ [NO]transformer....... [NO]...................[OKAY] .......[OKAY] [NO] [OKAY] ....... [OKAY]transformer ............ [NO] stochastic_transformer....... [OKAY] . [NO]stochastic_transformer ....... [OKAY]. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system async_io ............... [NO] ....... [NO] meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................. ...................................................... [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop nameop name ................................................................ installedinstalledinstalledinstalled ...... .. compatiblecompatiblecompatible compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at cpu_adamcpu_adam cpu_adam ............... cpu_adam............... ............... [YES] [YES] .....................[YES] ......[YES]...... [OKAY] runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja [OKAY][OKAY] ...... [OKAY] fused_adam ............. [NO]fused_adam .......fused_adam............. [OKAY].............[NO]fused_adam [NO]fused_lamb.................... .................... [OKAY] [OKAY][NO] [NO]....... fused_lamb[OKAY].......fused_lamb .............[OKAY]............. [NO]fused_lamb[NO] ........................... [NO][OKAY][OKAY] sparse_attn....... [OKAY]............ [NO] -------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system ....... sparse_attnsparse_attn[OKAY] meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at ........................ [NO]transformer[NO] .......................... [OKAY][OKAY][NO] sparse_attn runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ....... transformer[OKAY]............transformer ............ [NO] stochastic_transformer[NO] ............ .............. . [NO] [OKAY] [OKAY][NO]....... .......[OKAY] [OKAY] stochastic_transformer transformer .stochastic_transformer ............ [NO] [NO]........ ....... [OKAY] [NO] ....... [OKAY][OKAY] stochastic_transformer . [NO] ....... [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................. .................. .................................... [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name op name................ op nameop name ................ installed................................installed .. installed ..installedcompatible compatible ....-------------------------------------------------- -------------------------------------------------- compatible compatible -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam .............................. cpu_adam[YES] cpu_adam[YES]............... ..................... ...... [YES][OKAY] [YES] ......[OKAY] ......[OKAY] [OKAY] fused_adam ............. [NO]fused_adam .................... fused_adamfused_adam [OKAY] [NO] ............. .................... [NO][NO][OKAY]fused_lamb ........................... fused_lamb[NO] [OKAY][OKAY] .................... [NO]fused_lambfused_lamb [OKAY] .................... ............. [OKAY] [NO] [NO] .............. [OKAY][OKAY] sparse_attn ............sparse_attn ............[NO]sparse_attnsparse_attn [NO]................... ....... [OKAY]............[OKAY] [NO] [NO]transformer transformer....... ....... ............ ............[OKAY] [OKAY][NO] [NO] .......transformer ....... transformer [OKAY]............ [OKAY] ............ [NO]stochastic_transformer [NO].......stochastic_transformer ......... [NO][OKAY] [OKAY] [NO] ....... .......stochastic_transformer[OKAY] stochastic_transformer[OKAY] . [NO]. .......[NO] [OKAY]....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name-------------------------------------------------- op name ................ op name ................ op nameinstalled................installed ..................installed.. compatibleinstalledcompatible .. ..----------------------------------------------------------------------------------------------------compatible compatible-------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... ...............[YES]cpu_adam cpu_adam [YES]...... .....................[OKAY]............... [OKAY][YES][YES] ............ [OKAY][OKAY] fused_adam .............fused_adam [NO]............. .......[NO] fused_adam [OKAY]fused_adam ....... ............. ............. [OKAY]fused_lamb [NO] [NO] ............. ....... .......fused_lamb [NO][OKAY]............. [OKAY] ....... [NO] [OKAY]....... fused_lamb fused_lamb [OKAY] .......................... [NO][NO] .............. [OKAY][OKAY] sparse_attn ............ [NO] sparse_attn....... ............[OKAY] [NO] .......transformer sparse_attn[OKAY]sparse_attn ............ ............ ............transformer [NO] [NO][NO]............ ....... .............. [OKAY] [NO] [OKAY] [OKAY]....... stochastic_transformer [OKAY]transformertransformer . ........................[NO] stochastic_transformer [NO] [NO]....... ........ ....... [OKAY][OKAY] [NO] [OKAY]....... stochastic_transformer[OKAY] stochastic_transformer. [NO]. .......[NO] [OKAY]....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name ................................................................ installedinstalled installed installed ........ compatiblecompatiblecompatiblecompatible ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adamcpu_adam............... ..............................cpu_adam [YES] ............... [YES][YES]......[YES] ...... ...... ......[OKAY] [OKAY][OKAY] [OKAY] fused_adamfused_adam fused_adam..........................fused_adam .............[NO] [NO] ............. [NO] ....... .............. [NO] [OKAY][OKAY] [OKAY]....... [OKAY]fused_lambfused_lamb fused_lamb ..........................fused_lamb .............[NO] ............. [NO] .......[NO] [NO]....... .......[OKAY] [OKAY] .......[OKAY] [OKAY] sparse_attn ............sparse_attn sparse_attn [NO] ............sparse_attn ............ .......[NO] ............[NO][OKAY] .......[NO]....... transformer [OKAY]....... [OKAY] ............ [OKAY] [NO] transformertransformertransformer ....... ........................ ............[OKAY][NO] [NO] [NO] ....... ....... ....... [OKAY]stochastic_transformer [OKAY] [OKAY] . stochastic_transformer[NO] stochastic_transformerstochastic_transformer........ [OKAY] .[NO]. [NO].......[NO] ....... [OKAY][OKAY]....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY] [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------op name ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. op name op nameop name ................ ................ ................ ................installedinstalledinstalled ..installed.... compatible compatible.. compatible -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adamcpu_adam ...............[YES]............... .....................[YES][YES] [OKAY]......[YES] ...... [OKAY] ...... [OKAY] [OKAY] fused_adam ............. fused_adam[NO] fused_adam.............fused_adam .............[NO] ....... ............. [NO]....... [OKAY] [NO] [OKAY]....... .......fused_lamb[OKAY] fused_lamb.............[OKAY] fused_lamb[NO]............. fused_lamb.......[NO] ............. .............[OKAY] .......[NO][NO] [OKAY].............. [OKAY][OKAY] sparse_attn ............ [NO] ....... sparse_attn[OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ............sparse_attnsparse_attn transformer ............[NO] [NO]............ ............ ....... [NO]....... [NO] [OKAY][OKAY] ....... async_io [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ............... ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- ....... [OKAY]stochastic_transformertransformer[OKAY] [NO] ....... [NO] -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name op name................................op name ................installed................installed installed..installed.. .... compatiblecompatible compatible compatible-------------------------------------------------- ............ .transformertransformer[NO] ............[NO]................... .......[NO][NO][OKAY] [OKAY].............. [OKAY]stochastic_transformer[OKAY] transformer_inferenceasync_io ................. [NO][NO] .............. [OKAY][NO] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- utils .................. [YES] ...... [OKAY]transformer_inference cpu_adamcpu_adam cpu_adamcpu_adam............... ............... ............... [YES]............... [YES] [YES]...... [YES] [OKAY] ............ ...... [OKAY][OKAY] [OKAY] .stochastic_transformer stochastic_transformer[NO]. ....... .[NO][OKAY] [NO]....... .......[OKAY] [OKAY] .. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] fused_adam ............. [NO] fused_adam....... .............fused_adam[OKAY]fused_adam utils --------------------------------------------------.................. [NO] ............. ............. ....... fused_lamb[NO] [NO]............. [OKAY]....... ....... [NO][OKAY] fused_lamb.......[OKAY] [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] fused_lamb[OKAY]............. -------------------------------------------------- fused_lamb.............[NO] .............[NO]....... [NO].......[OKAY] [OKAY]....... sparse_attn [OKAY]............ [NO] ....... [OKAY] sparse_attnsparse_attn transformer ............ ............ ............sparse_attn [NO][NO][NO] ............ ..................... [NO][OKAY][OKAY] [OKAY] ....... transformer [OKAY]............transformer [NO]stochastic_transformer............transformer [NO]........ ............ .......[NO][OKAY][NO] ....... [OKAY] ....... [OKAY]stochastic_transformer [OKAY] stochastic_transformer. stochastic_transformer[NO]. ........[NO] [OKAY][NO]....... .......[OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... ...............[NO]async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. transformer_inference[NO] utils......... [NO][OKAY].................. .......[YES] [OKAY]...... [OKAY] utils .................. utilsquantizer[YES] ...................................... [YES][NO][OKAY] ............. [OKAY][OKAY] quantizer .............. quantizer[NO] -------------------------------------------------- .............. ....... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop name op name ................................ ................ ................ installedinstalledinstalled ......installed compatiblecompatible compatible .. ------------------------------------------------------------------------------------------------------------------------------------------------------ compatible -------------------------------------------------- cpu_adamcpu_adam cpu_adam..............................cpu_adam [YES]..............................[YES] ...... [YES]...... [YES] [OKAY]......[OKAY]...... [OKAY][OKAY] fused_adam .............fused_adam [NO]fused_adamfused_adam............. .......[NO] ............. ............. [OKAY][NO].......[NO] [OKAY].............. fused_lamb[OKAY] fused_lamb[OKAY]............. .............fused_lamb[NO] [NO] .............fused_lamb ....... .......[NO]............. [OKAY][NO][OKAY]....... .......[OKAY] [OKAY] sparse_attnsparse_attn sparse_attn........................sparse_attn [NO][NO]........................ ....... ....... [NO][NO] [OKAY][OKAY] ....... ....... transformer[OKAY][OKAY] transformer ........................transformertransformer [NO][NO] ...................................... [NO][OKAY][OKAY][NO] .............. [OKAY]stochastic_transformer[OKAY] stochastic_transformer .stochastic_transformer. [NO]stochastic_transformer[NO] . ....... ....... .[NO] [OKAY][OKAY] [NO]....... .......[OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name ................................ ................installedinstalled ................ .. ..installed installed compatible compatible.. .. --------------------------------------------------compatiblecompatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam [YES] cpu_adamcpu_adam ............... ...... ............... ...............[YES] [OKAY] [YES][YES] ...... ......[OKAY]...... [OKAY]fused_adam [OKAY] ............. [NO] ....... [OKAY] fused_adam fused_adam.............fused_lamb ..........................fused_adam[NO] [NO].............[NO]....... .......[NO]....... [OKAY] [OKAY] [OKAY] .......fused_lamb [OKAY] fused_lamb............. .............[NO]fused_lamb [NO]sparse_attn.................... ...................[NO] [OKAY] [NO] [OKAY]....... .......[OKAY] [OKAY] transformer sparse_attn............ sparse_attn............ [NO]............ sparse_attn.......[NO][NO] [OKAY].............. ............ [OKAY][OKAY][NO]stochastic_transformer ........ transformertransformer[OKAY] ............[NO]............ transformer .......[NO] [NO] ............ [OKAY].............. [NO][OKAY][OKAY] ....... [OKAY] stochastic_transformerstochastic_transformer ..stochastic_transformer [NO][NO] ............... [NO][OKAY] [OKAY]....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference transformer_inference.. ..[NO] .......[NO] [OKAY]....... [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ninjaninjaninja ninja.................................... ..................[OKAY][OKAY].................. [OKAY]-------------------------------------------------- --------------------------------------------------[OKAY]-------------------------------------------------- op name op name................op name-------------------------------------------------- ................installed ................ op name..installedinstalled .. ................compatible .. compatible --------------------------------------------------installedcompatible -------------------------------------------------- ..--------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. compatible cpu_adam--------------------------------------------------cpu_adam async_io ............... [NO] ....... [NO] .............................. cpu_adam [YES] [YES] ..................... ...... [OKAY] cpu_adam[YES] [OKAY] ..................... [OKAY][YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_adam .............fused_adam [NO]............. .......[NO]fused_adam .......[OKAY] .............fused_adam[OKAY] utils .................. [YES] ...... [OKAY] fused_lamb[NO]fused_lamb............. .......................... ....... [NO][NO][NO] [OKAY]..................... [OKAY] [OKAY] [OKAY] fused_lamb ............. fused_lamb[NO] .................... [NO][OKAY] sparse_attn ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] ....... sparse_attn ............ ............[OKAY][NO] [NO]....... .......[OKAY] [OKAY]sparse_attn DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system -------------------------------------------------- ............transformer transformer ............[NO]............ sparse_attn [NO] .......[NO] ............ ....... .......[OKAY][OKAY][NO] [OKAY] meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja .......transformer [OKAY]stochastic_transformer............stochastic_transformer [NO].. [NO]transformer .......[NO] ................... ....... [OKAY][OKAY][OKAY][NO] ....... stochastic_transformer[OKAY] . [NO]stochastic_transformer ....... .[OKAY] [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninjaJIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- async_io transformer_inference............... ..[NO] [NO]....... .......[NO] JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja [OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system utils .................. [YES] ...... [OKAY] meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformer_inference quantizer.. ..............[NO] [NO]....... .......[OKAY] [OKAY] utils-------------------------------------------------- .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op nameop name ................................op name................ installed................installedinstalled .. ..installedcompatible.. compatible..compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- compatible -------------------------------------------------- cpu_adam cpu_adam...............cpu_adam ............... ............... cpu_adam[YES][YES] ........................... [YES][OKAY][OKAY] ......[YES] [OKAY] ...... [OKAY] fused_adam ............. fused_adam[NO] fused_adam ............. ....... ............. [NO] [OKAY] [NO]....... .......[OKAY] fused_lamb [OKAY] .............fused_adamfused_lamb fused_lamb[NO] .......................... ....... ............. [NO][NO] [OKAY] .......[NO]....... [OKAY][OKAY] ....... [OKAY] fused_lambsparse_attn ......................... [NO][NO] .......sparse_attnsparse_attn....... ........................ [NO][NO] [OKAY].......[OKAY]....... [OKAY][OKAY] transformer ............transformer [NO]transformer............ ....... ............[OKAY][NO]sparse_attn [NO]................... .......[OKAY]stochastic_transformer [OKAY] .[NO] stochastic_transformer [NO]....... stochastic_transformer [OKAY]......... [OKAY] [NO]transformer [NO]................... .......[OKAY] [OKAY] [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................. ..................[OKAY][OKAY] [OKAY][OKAY]---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name op name ................ ................ op name................ installed installed ................ installed .. ..compatibleinstalled.. -------------------------------------------------- compatible.. compatible compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... cpu_adam[OKAY] cpu_adam cpu_adam ............... ...............[YES]............... [YES]......[YES]fused_adam [OKAY] ............ ............. [OKAY][OKAY][NO] ....... [OKAY] fused_adam ............. fused_lamb[NO] .............fused_adam....... [NO]fused_adam.............[OKAY] ....... [NO][OKAY] .............fused_lamb....... [NO].............[OKAY] .......[NO] sparse_attn[OKAY] fused_lamb....... ............ fused_lamb[NO].............[OKAY] ....................[NO] [OKAY] [NO] ....... transformer....... [OKAY] ............ [OKAY] sparse_attn[NO] ................... [OKAY][NO] ....... [OKAY] stochastic_transformer transformersparse_attn. ............ [NO]sparse_attn ............ .......[NO]............ ....... [NO][OKAY][OKAY][NO] .............. [OKAY][OKAY] stochastic_transformer transformertransformer. ........................[NO] [NO].......[NO] ..............[OKAY] [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO] [NO]....... .......[OKAY] [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name op name op name ................................op name ................installed................installed installed..installed.. compatible.... compatible --------------------------------------------------compatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adam ............... [YES] cpu_adam......cpu_adam cpu_adam[OKAY]............... ............... ...............[YES][YES] [YES] ...... ...... ...... [OKAY] [OKAY]fused_adam[OKAY] ............. [NO] ....... [OKAY] fused_lambfused_adamfused_adam .............fused_adam............. ............. [NO] .............[NO] [NO] .............. .......[NO][OKAY][OKAY] ....... [OKAY] [OKAY]fused_lamb fused_lamb............. fused_lamb............. [NO] .............sparse_attn [NO]....... .......[NO][OKAY]............ [OKAY].......[NO] [OKAY]....... [OKAY] transformersparse_attn ........................ [NO][NO]sparse_attn .......sparse_attn............ ....... [OKAY][NO]............[OKAY] .......[NO] [OKAY]stochastic_transformer.......transformer [OKAY]............ . transformer [NO]transformer [NO] ................... ............ .......[NO] [OKAY] [NO][OKAY]....... stochastic_transformer.......[OKAY] [OKAY]. [NO]stochastic_transformer .......stochastic_transformer .[OKAY]. [NO][NO] .............. [OKAY] [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name ................................................................ installedinstalled installed ..installed .. .. compatible..compatiblecompatible ----------------------------------------------------------------------------------------------------compatible-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. cpu_adam cpu_adamcpu_adam .............................. cpu_adam ...............[YES] [YES] ............... [YES] ............ [YES][OKAY] ...... [OKAY] ...... [OKAY] [OKAY] async_io ............... [NO] ....... [NO] [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_adam fused_adam............. fused_adam .............[NO]fused_adam............. [NO] ....... [NO]............. ....... .......[OKAY][NO][OKAY] transformer_inference .. [NO] ....... [OKAY] [OKAY].......fused_lamb [OKAY]............. async_io async_io...............utils .................................[NO] [NO][YES]....... .............[NO] fused_lamb [NO]fused_lamb ............. .......fused_lamb............. [NO][OKAY] ............. [OKAY][NO] [NO]....... [NO].......[OKAY] .......[OKAY] [OKAY] quantizer .............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer_inference-------------------------------------------------- transformer_inference.. ..[NO] [NO]....... .......[OKAY] sparse_attn sparse_attntransformer............ sparse_attn ........................ [NO] [NO]............ [NO] .............. [NO] ....... [OKAY][OKAY]....... [OKAY] [OKAY] [OKAY]transformerstochastic_transformer utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] transformer ............ transformer ............[NO]. ............[NO].......[NO] [NO] [OKAY]....... .............. [OKAY][OKAY][OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- stochastic_transformer stochastic_transformer.stochastic_transformer [NO]. ........[NO] [OKAY][NO] .............. [OKAY][OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop name op nameop name................ ................ installed................installed................ installed.... installed .. compatible compatible.. compatible -------------------------------------------------- --------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES]cpu_adam ......cpu_adam...............cpu_adam ...............[OKAY][YES]............... [YES]......[YES] ......[OKAY]...... [OKAY][OKAY] fused_adam ............. [NO] ....... [OKAY]fused_adam ............. [NO]fused_adamfused_lamb fused_adam .................... ............. [OKAY] .............[NO] [NO] [NO].............. fused_lamb ....... [OKAY][OKAY] ............. [OKAY] [NO] fused_lamb....... fused_lamb ............. [OKAY] .............[NO] sparse_attn.......[NO] [OKAY]................... [NO][OKAY] ....... [OKAY] sparse_attn ............transformer [NO]............ .......[NO] sparse_attn[OKAY]....... [OKAY]............sparse_attn transformer[NO]............ .......stochastic_transformer[NO] ............ [OKAY] ........[NO] [OKAY][NO]....... transformer.......[OKAY] ............transformer[OKAY] [NO]stochastic_transformer............ .......[NO] .[OKAY] [NO]....... .......[OKAY] stochastic_transformer[OKAY] .stochastic_transformer [NO] ........ [OKAY][NO] ....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------op name ................op nameop nameop name installed................................................ ..installedinstalledinstalled .... compatible..compatible compatiblecompatible---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adamcpu_adam...............cpu_adam ..............................[YES]............... [YES][YES][YES] ...... .................. [OKAY] [OKAY][OKAY][OKAY] fused_adam ............. fused_adamfused_adamfused_adam[NO] ................................. ............. [OKAY] [NO] [NO] [NO]fused_lamb.............. .................... [OKAY] [OKAY] [OKAY][NO] .......fused_lamb [OKAY]fused_lambfused_lamb ....................................... [NO] [NO] [NO] ....... ....... .......[OKAY][OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformersparse_attn sparse_attn............ sparse_attn ............ ............[NO] ...................[NO][NO] [NO][OKAY].............. [OKAY] ....... [OKAY] stochastic_transformer [OKAY] transformer.transformer [NO] transformer........................ .......[NO]............ [NO] [OKAY]....... [NO].......[OKAY] .......[OKAY] [OKAY] stochastic_transformerstochastic_transformer stochastic_transformer .. .[NO][NO] .......[NO]....... [OKAY].......[OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninjaJIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO]............... .......[NO] ....... [NO][NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY] [OKAY] async_io ............... [NO] ....... [NO] utils ..................utils [YES].................. ......[YES] [OKAY]...... transformer_inference .. [NO] ....... [OKAY] [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. quantizer[NO] ....... [OKAY] .............. [NO] .......-------------------------------------------------- quantizer .............. [NO] ....... [OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- ninjaninjaninjaninja .................. .................. .................................... [OKAY] [OKAY][OKAY][OKAY]-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op name................ op nameop name................ installed................................ installed .. installed..installed compatible compatible .. ..-------------------------------------------------- -------------------------------------------------- compatible compatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam ..............................cpu_adamcpu_adam [YES]...............[YES]............... ...... [YES]...... [YES][OKAY] [OKAY] ...... ...... [OKAY][OKAY] fused_adam ............. fused_adam[NO] ....................fused_adam fused_adam [OKAY] [NO] .......................... [NO].......fused_lamb [NO] ....... ....................[OKAY][OKAY] [NO][OKAY] fused_lamb....... fused_lamb .............[OKAY].............fused_lamb [NO].............[NO] .......[NO]....... [OKAY][OKAY]....... sparse_attn [OKAY]............ [NO] ....... [OKAY] transformer sparse_attnsparse_attn............ ............ [NO] sparse_attn [NO]............ ....... ............ .......[NO][OKAY] [NO] ....... [OKAY].......stochastic_transformer[OKAY] [OKAY].transformer transformer [NO] ............transformer................... [NO] ............[NO] [OKAY] ..............[NO] ....... [OKAY] [OKAY][OKAY] stochastic_transformer stochastic_transformerstochastic_transformer. [NO]. ........[NO] [NO].......[OKAY] .......[OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference ..utils [NO].................. .......[YES] [OKAY]...... -------------------------------------------------- [OKAY] utilsquantizer ................................ [YES][NO] ............. [OKAY][OKAY] quantizer --------------------------------------------------.............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] quantizer .............. [NO] ....... [OKAY] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op name--------------------------------------------------op name op name ................................ op name installed................ installed ................ .. installed installed..compatible compatible....-------------------------------------------------- --------------------------------------------------compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES]cpu_adam ..................... cpu_adam cpu_adam[OKAY] [YES] .................................... [YES] [YES][OKAY] ............fused_adam [OKAY].............[OKAY] [NO] ....... [OKAY]fused_adam ............. [NO]fused_adam fused_lamb ....... ............. .............fused_adam [OKAY] [NO][NO]............. fused_lamb ....... [NO]....... ............. [OKAY] ....... [OKAY][NO] [OKAY]....... [OKAY]fused_lamb fused_lamb .......................... [NO] [NO]....... sparse_attn....... ............[OKAY][OKAY] sparse_attn[NO] ................... [NO][OKAY] ....... [OKAY] transformer ............transformersparse_attn sparse_attn ........................[NO] [NO]...................[NO] [OKAY][NO].............. .......[OKAY][OKAY] stochastic_transformer [OKAY] .transformerstochastic_transformer [NO] transformer ............. ................... [NO] [OKAY][NO] [NO] ....... ..............[OKAY] [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninja JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at -------------------------------------------------- runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO]utils ......................... [OKAY][YES] ...... [OKAY] async_ioasync_io .............................. [NO][NO] .............. [NO][NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. quantizer ..............utils [NO].................. .......[YES] [OKAY]...... [OKAY] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] async_io ............... [NO] ....... [NO] -------------------------------------------------- utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- ....... [OKAY]-------------------------------------------------- -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja .................. .................................... ..................[OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op nameop name ................op name................ op name installed installed ................ .................. .. installedinstalledcompatiblecompatible ..--------------------------------------------------..-------------------------------------------------- compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... cpu_adam[YES] ..................... [YES]cpu_adam[OKAY]cpu_adam ...... ............... ............... [OKAY] [YES][YES] ............ [OKAY]fused_adam [OKAY] ............. [NO] .......fused_adam [OKAY]............. [NO] fused_adam.......fused_adamfused_lamb .............[OKAY].......................... [NO][NO][NO] .......fused_lamb.............. [OKAY] ............. [OKAY] [OKAY] [NO] .......fused_lamb fused_lamb [OKAY] ............. ............. [NO] sparse_attn [NO] ....... ............ ....... [OKAY] [NO] [OKAY] ....... sparse_attn[OKAY] ............ [NO] transformer....... ............[OKAY] [NO] .......sparse_attn transformersparse_attn [OKAY] ........................ ............ [NO][NO] stochastic_transformer[NO] .............. ........[OKAY][OKAY] [NO][OKAY] transformer stochastic_transformer....... transformer............[OKAY]. ............[NO][NO] [NO].............. .......[OKAY][OKAY] [OKAY] stochastic_transformer stochastic_transformer . .[NO] [NO]....... .......[OKAY] [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ninjaninjaninjaninja .................. .................. ....................................[OKAY] [OKAY][OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name op name op name op name................................................ installedinstalled installed.................. ....compatibleinstalled compatible..--------------------------------------------------compatible compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES]cpu_adamcpu_adam ................................................... [YES][YES][OKAY][YES] ............ ...... [OKAY] [OKAY] [OKAY] fused_adam ............. [NO] .......fused_adam fused_adam fused_adam[OKAY]............. ..........................fused_lamb[NO] [NO]....................[NO] [NO] [OKAY].............. ....... [OKAY] [OKAY][OKAY] fused_lamb DeepSpeed general environment info: .............fused_lamb fused_lamb [NO]............. .................... [NO][NO][OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] sparse_attn .......................... [OKAY][OKAY][NO] torch version .................... 1.8.1 ....... [OKAY] sparse_attn ............transformer [NO]............ .......[NO] sparse_attnsparse_attn[OKAY] torch cuda version ............... 11.1 ....... ............[OKAY]transformer............ nvcc version ..................... 11.2 ............[NO][NO] stochastic_transformer ....... [NO]....... ........[OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] [OKAY][OKAY][NO] transformer ................... transformerstochastic_transformer[OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 [NO]. ...................[NO] [NO][OKAY]....... .......[OKAY] [OKAY]stochastic_transformer . stochastic_transformer[NO] ........ [OKAY][NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1 torch cuda version torch version............... 11.1.................... nvcc version1.8.1 ..................... 11.2torch cuda version deepspeed install path............... ........... 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info:deepspeed infonvcc version ................... .....................0.4.2+bc17042, bc17042, big-science 11.2deepspeed wheel compiled w. torch install path...... deepspeed install path ...............torch 1.8, cuda 11.1 ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed info torch version................... .................... 0.4.2+bc17042, bc17042, big-science1.8.1 deepspeed wheel compiled w.torch cuda version ..................... 11.1torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_io ............... [NO]async_io ...................... [NO][NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ....... [NO] torch version .................... 1.8.1 transformer_inference .. transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch cuda version ............... 11.1 torch version .................... 1.8.1 utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] torch cuda version ............... 11.1 quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] async_io ............... [NO] ....... [NO] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] nvcc version ..................... 11.2 ---------------------------------------------------------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] utils .................. [YES] ...... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: DeepSpeed general environment info: -------------------------------------------------- torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... torch version1.8.1 .................... 1.8.1torch cuda version ............... torch cuda version11.1 ...............nvcc version 11.1..................... nvcc version11.2 ..................... deepspeed install path11.2 torch version .................... 1.8.1 ........... deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch cuda version ............... 11.1 ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. utils[NO] ......................... [OKAY] [YES]-------------------------------------------------- ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO]utils ......................... [OKAY] [YES] ...... [OKAY] utils ..................quantizer [YES] .................... [NO][OKAY] ....... [OKAY] quantizer --------------------------------------------------.............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] quantizer .............. [NO] ....... [OKAY] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... -------------------------------------------------- [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................................... .................. .................. [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name op nameop name ................ ................................installed................ installed ..installed installed .. compatible..compatible.. --------------------------------------------------compatiblecompatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam cpu_adam............... cpu_adam............... [YES]...............[YES]............... ......[YES] ...... ...... [YES] [OKAY][OKAY] [OKAY] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_adamfused_adam .......................... fused_adam[NO] fused_adam[NO] ........................................ [OKAY][NO][NO][OKAY] async_io ............... [NO] ....... [NO] ..............fused_lamb [OKAY]fused_lamb[OKAY]............. .............[NO] fused_lamb[NO]fused_lamb....... [OKAY]................................. [NO][NO][OKAY] transformer_inference .. [NO] ....... [OKAY] .............. [OKAY][OKAY] sparse_attn ............ [NO] .......sparse_attn [OKAY]............ utils .................. [YES] ...... [OKAY] [NO]sparse_attn transformersparse_attn ....... ............ ........................[OKAY] [NO] [NO] transformer.......[NO] ..........................[OKAY] quantizer .............. [NO] ....... [OKAY] [OKAY][OKAY][NO] transformer -------------------------------------------------- ................... transformerstochastic_transformer[OKAY] [NO]............ ........stochastic_transformer [NO][NO]. [OKAY].......[NO]....... [OKAY]....... [OKAY]stochastic_transformer [OKAY] . stochastic_transformer[NO] ....... .[OKAY] [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... [NO]............... [NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO] .......[NO] [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY] [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. .............................. [NO] [NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... async_io .......[OKAY] ...............[OKAY] [NO] ....... [NO] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] transformer_inferencequantizerquantizer .............................. [NO][NO][NO] ..................... [OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name transformer_inference .. [NO] ....... [OKAY] op name op nameop name ................................ ................installed ................ installed installed.. installed .. compatible.. ..--------------------------------------------------compatiblecompatible compatible-------------------------------------------------- utils .................. [YES] ...... [OKAY] ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... ...............[OKAY] quantizer .............. [NO] ....... [OKAY] cpu_adam[YES]cpu_adam .................................... [OKAY][YES][YES] -------------------------------------------------- fused_adam............ .............[OKAY][OKAY] [NO] fused_adam .................... [OKAY][NO] ....... fused_lamb[OKAY] fused_adamfused_adam............. fused_lamb .......................... [NO] .............[NO]....... [NO] [NO].......[OKAY] .............. [OKAY] [OKAY][OKAY] fused_lambfused_lamb .......................... [NO][NO] sparse_attn....... sparse_attn...................[OKAY] ............ [NO][OKAY][NO] .............. [OKAY][OKAY] sparse_attntransformertransformer ........................ ............sparse_attn[NO][NO] [NO]................... .......[OKAY] .......[NO] [OKAY] [OKAY]stochastic_transformer ....... transformer [OKAY]............. stochastic_transformer [NO][NO] transformer ............... [OKAY] [OKAY]............ [NO] [NO]....... .......[OKAY]stochastic_transformer [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. . [NO]stochastic_transformer ....... [OKAY].  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. utils[NO] ......................... [YES][OKAY] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.utils quantizer.................. ..............[YES] [NO]...... .......[OKAY] [OKAY] quantizer --------------------------------------------------.............. [NO] ....... [OKAY] async_io-------------------------------------------------- ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] ....... [OKAY]utils .................. [YES] ...... utils[OKAY] .................. [YES] quantizer...... ..............[OKAY] [NO] ....... [OKAY]quantizer .............. [NO] --------------------------------------------------....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`... [NO] ....... [OKAY] utils .................. [YES] ......async_io [OKAY]............... [NO] ....... [NO]quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [OKAY] async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] utils utils.................. .................. [YES][YES] ............ [OKAY][OKAY] transformer_inference .. [NO] transformer_inference....... .. [OKAY][NO] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] ...... [OKAY]quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY]-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io ............... [NO] ....... [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 transformer_inference .. [NO] ....... [OKAY] torch cuda version ............... 11.1 utils .................. [YES] ...... [OKAY] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] quantizer .............. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO]............... [NO] ....... [NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] quantizer...... ..............[OKAY] [NO] ....... quantizer[OKAY] .............. [NO] ....... --------------------------------------------------[OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference utils.. ..................[NO] [YES]....... ......[OKAY] [OKAY] quantizerutils ................................ [YES][NO] ............. [OKAY][OKAY] quantizer-------------------------------------------------- .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info: utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utilstransformer_inference .................... [YES] ......[NO] [OKAY] ....... quantizer[OKAY] .............. [NO] ....... [OKAY] utils-------------------------------------------------- .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info: torch version .................... 1.8.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ............... 11.1 torch version .................... 1.8.1 nvcc version ..................... 11.2 torch cuda version ............... 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science nvcc version ..................... 11.2 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES]quantizer .................... [OKAY][NO] ....... [OKAY] quantizer .............. [NO] -------------------------------------------------- ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] utils ..................async_io [YES]............... ......[NO] [OKAY]....... [NO] quantizer .............. [NO] ....... [OKAY] transformer_inference-------------------------------------------------- .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] .......utils [OKAY].................. [YES] ...... [OKAY] utils ..................quantizer [YES].............. ......[NO] [OKAY]....... [OKAY] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 DeepSpeed general environment info:nvcc version .....................DeepSpeed general environment info: 11.2 deepspeed install path ...........torch install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch install path............... deepspeed info............... ................... 0.4.2+bc17042, bc17042, big-science ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed wheel compiled w. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']...... torch versiontorch 1.8, cuda 11.1 torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1 DeepSpeed general environment info:DeepSpeed general environment info: nvcc version nvcc version..................... ..................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... DeepSpeed general environment info:11.211.2 deepspeed install path deepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info...............deepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ...... ...... torch 1.8, cuda 11.1torch 1.8, cuda 11.1torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES] ............ [OKAY][OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_io ............... [NO] async_io....... [NO]............... torch version .................... 1.8.1 torch cuda version ............... 11.1 [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science transformer_inference .. [NO]utils ......................... [OKAY][YES] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ...... [OKAY] utilsquantizer ................................ [YES][NO] ............. [OKAY][OKAY] quantizer-------------------------------------------------- .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... async_io[NO] ............... [NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io ............... [NO] ....... [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] ....... [OKAY]utils .................. [YES] ...... utils[OKAY] .................. [YES] quantizer...... ..............[OKAY] [NO] ....... [OKAY] quantizer .............. [NO]-------------------------------------------------- ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. /bin/sh: line 0: type: git: not found async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY]quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... -------------------------------------------------- JIT compiled ops requires ninja [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io transformer_inference............... ..[NO] [NO]....... .......[NO] [OKAY] utils .................. [YES] ...... [OKAY]transformer_inference .. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] utils ..................-------------------------------------------------- [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info:DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** .................... 1.8.1torch version ....................torch cuda version 1.8.1............... 11.1 torch cuda version nvcc version............... .....................11.1 11.2 nvcc version deepspeed install path..................... ...........11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path ...........deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................. .................. ..................[OKAY][OKAY].................. [OKAY]----------------------------------------------------------------------------------------------------[OKAY] --------------------------------------------------op nameop name --------------------------------------------------................................op name installedinstalled................op name .... ................ compatibleinstalled compatible installed..-------------------------------------------------- ..--------------------------------------------------compatible compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam [YES]............... ......cpu_adam[YES]cpu_adam [OKAY]............... /bin/sh: line 0: type: git: not found ............... [YES][YES] .................. [OKAY][OKAY][OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adamfused_adam fused_lamb.......................... .............[NO]fused_adam[NO] [NO] ....... ....... .............[OKAY] ....... [NO][OKAY] [OKAY].......fused_lamb ............. [NO] [OKAY]fused_lamb....... .............[OKAY]fused_lamb sparse_attn[NO] ................................ [NO][OKAY][NO] ....... [OKAY]sparse_attn ............transformer .......[NO]............ .......[NO]sparse_attn[OKAY] ............ .......[OKAY] [NO][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer....... ............[OKAY] stochastic_transformer[NO] async_io ............... [NO] ....... [NO] transformer........ [NO]............[OKAY]sparse_attn .......[NO]............ stochastic_transformer [OKAY]....... **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer_inference .. [NO] ....... [OKAY] [NO].[OKAY] .......[NO]stochastic_transformer [OKAY]....... utils .................. [YES] ...... [OKAY] .[OKAY]transformer quantizer .............. [NO] ....... [OKAY] [NO] ....... ............[OKAY] -------------------------------------------------- [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO] ....... [NO][NO] transformer_inference transformer_inference.. [NO] ....... ..[OKAY] [NO] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. --------------------------------------------------[NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer_inference .. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utils .................. [YES] ...... [OKAY] DeepSpeed general environment info: quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO] [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference ..transformer_inference [NO].. [NO]....... .......[OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inferencetransformer_inference .... [NO][NO] ....... .......[OKAY] [OKAY] async_ioasync_io .............................. [NO][NO] .............. [NO][NO] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] ---------------------------------------------------------------------------------------------------- [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install pathtorch install path ............................................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch versiontorch version ............................................................ 1.8.11.8.11.8.1 torch cuda versiontorch cuda versiontorch cuda version ............................................. 11.111.111.1 nvcc versionnvcc versionnvcc version ............................................................... 11.211.211.2 deepspeed install pathdeepspeed install pathdeepspeed install path ................................. ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed infodeepspeed info ......................................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. deepspeed wheel compiled w. ...... ...... ...... torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] async_io ............... [NO] async_io....... ...............[NO] [NO] ....... [NO] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utilsutils .................................... [YES] [YES]...... ......[OKAY] [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] utils .................. [YES] ...... [OKAY] torch version .................... 1.8.1 quantizer .............. [NO] ....... [OKAY] torch cuda version ............... 11.1 -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info: utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 -------------------------------------------------- torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. utils[YES] ........................ [YES] [OKAY]...... [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'].................... 1.8.1 torch version ....................torch cuda version ...............1.8.1 11.1 nvcc versiontorch cuda version .................................... 11.211.1 deepspeed install pathnvcc version ................................ 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install pathdeepspeed info .............................. 0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.1 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. utils[NO] ......................... [YES][OKAY] ...... [OKAY] utilsquantizer ................................ [YES][NO] ............. [OKAY][OKAY] quantizer-------------------------------------------------- .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_ioasync_io .............................. [NO][NO] ....... .......[NO] [NO] torch version .................... 1.8.1 torch cuda version ............... 11.1 transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] nvcc version ..................... 11.2 utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch version torch cuda version.................... ...............1.8.1 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']..................... 11.2deepspeed info ...................deepspeed install path 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ................. torch 1.8, cuda 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info..................... ...................11.2 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'].................... 1.8.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch version torch cuda version.................... ...............1.8.1 11.1 async_io ............... [NO] ....... [NO] torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path transformer_inference .. [NO] ....... [OKAY] deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info: torch versiontorch version ........................................ 1.8.11.8.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'].................... 1.8.1 torch version torch cuda version.................... ...............1.8.1 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path ...........deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ...............async_io [NO] ...................... [NO][NO] async_io ............... [NO] .......transformer_inference [NO].. ....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [NO] ....... [OKAY] DeepSpeed general environment info: [OKAY] utils ..................transformer_inference [YES].. ......[NO] [OKAY]....... [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer .............. [NO] utils....... ..................[OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] torch version .................... 1.8.1 [YES] ...... [OKAY]-------------------------------------------------- torch cuda version ............... 11.1 nvcc version ..................... 11.2 ---------------------------------------------------------------------------------------------------- quantizer .............. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer .............. [NO] .......quantizer [OKAY].............. [NO] ....... --------------------------------------------------[OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... DeepSpeed general environment info:1.8.1 torch cuda version ............... 11.1 torch install pathnvcc version .................................... 11.2 deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info torch version................... ....................0.4.2+bc17042, bc17042, big-science 1.8.1deepspeed wheel compiled w. ......torch cuda version torch 1.8, cuda 11.1............... 11.1 DeepSpeed general environment info: nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 DeepSpeed general environment info:nvcc version ..................... 11.2 deepspeed install path ........... torch install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...............deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']...... torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch version .................... 1.8.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 async_io ............... [NO] ....... [NO] torch cuda version ............... 11.1 nvcc version ..................... 11.2 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... DeepSpeed general environment info:0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. transformer_inference .. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] quantizer .............. [NO] ....... [OKAY] torch version .................... 1.8.1 -------------------------------------------------- torch cuda version ............... 11.1 torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 torch version torch version.................... ....................1.8.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 1.8.1 torch cuda version torch cuda version............... ...............11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science 11.1nvcc version .....................nvcc version 11.2..................... deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info: torch install pathDeepSpeed general environment info: ............... ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 torch install path['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch cuda version ............... torch version11.1 .................... nvcc version1.8.1 ..................... DeepSpeed general environment info: 11.2torch cuda version ...............deepspeed install path 11.1........... nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']..................... deepspeed info11.2 ...................deepspeed install path 0.4.2+bc17042, bc17042, big-science........... deepspeed wheel compiled w. ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...... deepspeed infotorch 1.8, cuda 11.1 ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer ..............  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] utils async_io.................. [YES]............... ......[NO] [OKAY]....... [NO] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** nvcc versionnvcc version .......................................... 11.2 11.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 /bin/sh: line 0: type: git: not found torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... torch install path1.8.1 ............... torch cuda version ............... 11.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']nvcc version ..................... torch version11.2 ....................deepspeed install path 1.8.1........... torch cuda version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...............deepspeed info 11.1................... nvcc version0.4.2+bc17042, bc17042, big-science ..................... deepspeed wheel compiled w.11.2 ...... deepspeed install pathtorch 1.8, cuda 11.1 ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 DeepSpeed general environment info: torch cuda version ............... 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ............... 11.1 torch version .................... 1.8.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science torch cuda version ............... 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 DeepSpeed general environment info:DeepSpeed general environment info: nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1 torch cuda version torch cuda version............... ............... 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 DeepSpeed general environment info:DeepSpeed general environment info: nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version ............... 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ......deepspeed info torch 1.8, cuda 11.1................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'].................... 1.8.1 torch version ....................torch cuda version ...............1.8.1 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc versionDeepSpeed general environment info: ..................... 11.2 deepspeed install path ........... torch install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found ...............deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']...... torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1 nvcc version nvcc version..................... .....................11.2 11.2 deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 DeepSpeed general environment info:DeepSpeed general environment info: nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... 11.2..................... 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'].................... 1.8.1 torch version torch cuda version.................... ...............1.8.1 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed info........... ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install pathtorch version ................................... 1.8.1 torch cuda version ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 11.1 torch versionnvcc version ......................................... 1.8.111.2 deepspeed install path torch cuda version........... ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.1 deepspeed infonvcc version ........................................ 0.4.2+bc17042, bc17042, big-science11.2 deepspeed wheel compiled w.deepspeed install path ................. torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 DeepSpeed general environment info:torch cuda version ............... 11.1 nvcc version torch install path..................... 11.2............... deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed info ................... torch version0.4.2+bc17042, bc17042, big-science ....................deepspeed wheel compiled w. 1.8.1...... torch 1.8, cuda 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: DeepSpeed general environment info: torch install pathtorch install path .............................. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch versiontorch version ........................................ torch version 1.8.1 1.8.1 .................... torch cuda version1.8.1torch cuda version .............................. torch cuda version 11.1 11.1 ............... nvcc version nvcc version 11.1 ..................... ..................... nvcc version 11.2 11.2 ..................... deepspeed install path deepspeed install path 11.2 ........... ........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...................................... deepspeed info 0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science ................... deepspeed wheel compiled w.deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science ...... ......deepspeed wheel compiled w.torch 1.8, cuda 11.1 torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version .................... torch version1.8.1 .................... 1.8.1torch cuda version ............... torch cuda version11.1 ............... nvcc version11.1 ..................... nvcc version11.2 .....................deepspeed install path 11.2........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']................... deepspeed info0.4.2+bc17042, bc17042, big-science ...................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ...... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** fused_adam ............. [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info:DeepSpeed general environment info: fused_lamb ............. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch install pathtorch install path torch install path.............................. ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version torch version........................................ ....................1.8.11.8.1 1.8.1 torch cuda versiontorch cuda version torch cuda version ............... ..............................11.1 11.111.1nvcc version sparse_attn ............ [NO] ....... [OKAY] nvcc versionnvcc version..................... ..........................................11.2 11.211.2deepspeed install path deepspeed install pathdeepspeed install path........... ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed infodeepspeed info................... ...................0.4.2+bc17042, bc17042, big-science................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.......deepspeed wheel compiled w. ......torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1torch 1.8, cuda 11.1 transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ...... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path torch version............... .................... 1.8.1 torch cuda version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... 11.1 torch versionnvcc version ......................................... 1.8.111.2 deepspeed install path torch cuda version........... ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.1 deepspeed infonvcc version ........................................ 0.4.2+bc17042, bc17042, big-science11.2 deepspeed wheel compiled w.deepspeed install path ................. torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info: deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version .................... 1.8.1torch version ....................torch cuda version 1.8.1............... 11.1 nvcc versiontorch cuda version ..................... ...............11.2 11.1deepspeed install path ...........nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']..................... deepspeed info11.2 ................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed install path deepspeed wheel compiled w............ ...... torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1torch version ....................torch cuda version 1.8.1............... 11.1torch cuda version /bin/sh: line 0: type: git: not found ...............nvcc version 11.1..................... 11.2nvcc version deepspeed install path..................... ...........11.2 deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...........deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-sciencedeepspeed info deepspeed wheel compiled w.................... ......0.4.2+bc17042, bc17042, big-science torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version .................... torch version1.8.1 .................... 1.8.1torch cuda version ............... torch cuda version11.1 ...............nvcc version 11.1..................... nvcc version11.2 ..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found deepspeed info deepspeed info................... ................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info:deepspeed info ................... deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ......torch install path torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: utils .................. [YES] ...... [OKAY] torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ...................... [NO][NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] ...... quantizer[OKAY] .............. [NO] .......quantizer [OKAY].............. -------------------------------------------------- [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info DeepSpeed general environment info:................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 transformer_inference .. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] utils .................. [YES] ...... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install pathDeepSpeed general environment info: ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch install path ............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch cuda version ............... torch version11.1 ....................nvcc version 1.8.1..................... 11.2torch cuda version deepspeed install path............... ...........11.1 nvcc version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] .....................deepspeed info 11.2................... deepspeed install path0.4.2+bc17042, bc17042, big-science ...........deepspeed wheel compiled w. ......['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................. .................. .................................... [OKAY] [OKAY] [OKAY] [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------op nameop name op name ................................ op name ................installed installed installed .................. .... compatible installed compatiblecompatible -------------------------------------------------- .. ---------------------------------------------------------------------------------------------------- compatible -------------------------------------------------- cpu_adam ............... cpu_adam[YES]cpu_adam .....................cpu_adam [YES]..............................[OKAY] ......[YES] [YES]......[OKAY] ......[OKAY] [OKAY]fused_adam ............. [NO] ....... [OKAY]fused_adam fused_adam .............fused_lambfused_adam .............[NO].......................... ....... [NO][NO] [NO] .......[OKAY]....... .......[OKAY] [OKAY] fused_lamb[OKAY] /bin/sh: line 0: type: git: not found .............fused_lamb fused_lamb [NO] ................................. [NO][NO][OKAY] /bin/sh: line 0: type: git: not found sparse_attn....... ....... ............ [OKAY] [OKAY] [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found sparse_attntransformer ........................ [NO][NO] .............. sparse_attn sparse_attn[OKAY][OKAY] ........................ stochastic_transformer[NO] transformer [NO] ....... ............. .......[NO][OKAY][NO] [OKAY].............. [OKAY]transformer[OKAY] ............ transformer stochastic_transformer [NO] .................... [NO][NO][OKAY] .............. [OKAY] stochastic_transformer[OKAY] . [NO] stochastic_transformer....... [OKAY] . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** using world size: 256, data-parallel-size: 8, tensor-model-parallel size: 4, pipeline-model-parallel size: 8 using torch.float16 for parameters ... /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found ------------------------ arguments ------------------------ **** Git info for Megatron: git_hash=unknown git_branch=unknown **** accumulate_allreduce_grads_in_fp32 .............. False **** Git info for Megatron: git_hash=unknown git_branch=unknown **** adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/codecarbon consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False data_impl ....................................... mmap data_parallel_size .............................. 8 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1185609.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 5 evidence_data_path .............................. None exit_duration_in_mins ........................... 110 exit_interval ................................... None ffn_hidden_size ................................. 20480 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False global_batch_size ............................... 2048 hidden_dropout .................................. 0.1 hidden_size ..................................... 16384 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 512 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 10 log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ 126953125 lr_decay_style .................................. cosine lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 216320 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 32 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 32 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 8 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['16', '16', '6_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 42 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 300000000 use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 256 zero_allgather_bucket_size ...................... 0.0 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 16 to global batch size 2048 with batch size increments 16 over 6000000 samples. > building GPT2BPETokenizer tokenizer ... /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']DeepSpeed general environment info: torch version .................... 1.8.1 torch install pathtorch cuda version .............................. 11.1 nvcc version ..................... 11.2 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed install path ........... torch version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] .................... deepspeed info1.8.1 ................... 0.4.2+bc17042, bc17042, big-sciencetorch cuda version ...............deepspeed wheel compiled w. 11.1...... torch 1.8, cuda 11.1nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... DeepSpeed general environment info:['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version DeepSpeed general environment info:.................... 1.8.1 torch cuda version ............... 11.1torch install path nvcc version............... ..................... 11.2 deepspeed install path ...........['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch versiondeepspeed info ....................................... 1.8.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.torch cuda version ..................... torch 1.8, cuda 11.111.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer quantizer.............. ..............[NO] [NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference ..utils [NO].................. .......[YES] ......[OKAY] [OKAY] utilsquantizer ................................ [YES][NO] ............. [OKAY][OKAY] quantizer --------------------------------------------------.............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1torch cuda version ............... torch cuda version11.1 ...............nvcc version 11.1..................... 11.2nvcc version deepspeed install path..................... ...........11.2 deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']................... deepspeed info0.4.2+bc17042, bc17042, big-science ...................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > setting tensorboard ... > setting codecarbon ... -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop name op name op name................................ ................installed................installed installed installed .. ...... compatible compatiblecompatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- cpu_adamcpu_adamcpu_adamcpu_adam ............................................................ [YES][YES][YES] [YES] ............ ...... ...... [OKAY][OKAY] [OKAY] [OKAY] fused_adamfused_adam fused_adamfused_adam.......................... .............[NO].............[NO] [NO]....... [NO] ....... [OKAY]....... ....... [OKAY][OKAY][OKAY] fused_lamb fused_lambfused_lambfused_lamb ............. ..........................[NO]............. [NO] [NO]..............[NO] [OKAY] .......[OKAY] ....... [OKAY][OKAY] sparse_attnsparse_attn ............sparse_attn sparse_attn ............[NO]........................ [NO][NO].......[NO] [OKAY] .............. ....... [OKAY][OKAY][OKAY] transformer transformertransformer............transformer ....................................[NO] [NO].......[NO][NO] ..............[OKAY] ....... [OKAY][OKAY][OKAY] stochastic_transformer stochastic_transformer.stochastic_transformerstochastic_transformer [NO] ........ .[NO].[OKAY] .......[NO][NO] [OKAY].............. [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... quantizer[OKAY] .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- > initializing torch distributed ... DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version ...............torch cuda version 11.1............... nvcc version11.1 .....................nvcc version 11.2..................... deepspeed install path11.2 ...........deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] [YES]...... ......[OKAY] [OKAY] fused_adamfused_adam .......................... [NO][NO] .............. [OKAY][OKAY] fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] transformer transformer............ ............[NO] [NO]....... .......[OKAY] [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] transformer_inference....... ..[NO] [NO] ....... [OKAY] utils ..................transformer_inference [YES].. ......[NO] [OKAY]....... [OKAY] quantizerutils ................................ [NO][YES] ............. [OKAY][OKAY] --------------------------------------------------quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name op name--------------------------------------------------................ op name ................installedop name................ installed..................installed compatible....installed compatible..--------------------------------------------------compatible compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... ...............cpu_adam[YES]cpu_adam [YES]..................... ............... ......[YES] [OKAY] [YES]......[OKAY] [OKAY]...... [OKAY] fused_adam ............. [NO] fused_adamfused_adam....... fused_adam .............[OKAY] ............. .............[NO] fused_lamb [NO].............[NO]....... ....... .......[NO] [OKAY] [OKAY] ....... [OKAY] [OKAY]fused_lamb fused_lamb.............fused_lamb .............[NO]............. [NO] ....... [NO] ....... [OKAY] sparse_attn.......[OKAY] ............[OKAY] [NO] ....... [OKAY] transformer ............ [NO]sparse_attn sparse_attn sparse_attn................... [OKAY]............[NO]............ [NO].......[NO] stochastic_transformer .......[OKAY] ....... .[OKAY]transformer [OKAY] ............ transformer[NO] transformer [NO]....... ................... ............[OKAY][OKAY][NO] [NO]....... .......[OKAY]stochastic_transformer [OKAY] . stochastic_transformer[NO] .......stochastic_transformer. [OKAY][NO] ........ [NO][OKAY] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`............... [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] ....... [OKAY]utils .................. [YES] ...... [OKAY]utils .................. quantizer[YES] .................... [NO][OKAY] ....... [OKAY] quantizer --------------------------------------------------.............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ...................................................... [OKAY].................. [OKAY] [OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------op nameop nameop name ................op name ................ ................ installed installed................installed.. ....compatibleinstalled compatiblecompatible--------------------------------------------------.. ---------------------------------------------------------------------------------------------------- compatible -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam[YES]............... cpu_adam......[YES]............... .....................[YES][OKAY] [OKAY][YES]...... ......[OKAY] [OKAY] fused_adam .............fused_adam [NO]fused_adam ....................fused_adam [NO]............. .............[OKAY] ....... [NO] [NO]fused_lamb[OKAY] ........................... [OKAY][NO][OKAY]fused_lamb .................... fused_lamb[NO][OKAY]fused_lamb ................................. [OKAY] [NO] [NO] .............. [OKAY][OKAY] sparse_attn ............ [NO] sparse_attn....... ............[OKAY] [NO] sparse_attn.......sparse_attn transformer .................................... [OKAY] [NO][NO] [NO].............. transformer ....... [OKAY] [OKAY][OKAY]............ transformer[NO] transformer ............ .......stochastic_transformer ............ [NO] [OKAY] [NO]. ....... .......[NO][OKAY] stochastic_transformer ....... [OKAY] [OKAY]. stochastic_transformer [NO] .......stochastic_transformer. [OKAY][NO] ........ [NO][OKAY] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inferenceutils .................... [NO][YES] ............. [OKAY][OKAY] utilsquantizer ................................ [YES][NO] ............. [OKAY][OKAY] quantizer --------------------------------------------------.............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name-------------------------------------------------- ................op nameop nameop name installed ................................ ................ .. installed installedinstalled compatible .. .... -------------------------------------------------- compatiblecompatible compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ......cpu_adam cpu_adam [OKAY]cpu_adam............... ..............................[YES] [YES][YES]...... ............[OKAY]fused_adam [OKAY][OKAY]............. [NO] ....... [OKAY] fused_adam fused_lamb.............fused_adam fused_adam............. [NO] ............. [NO]............. ....... [NO] [NO].......[OKAY] .......[OKAY]....... [OKAY]fused_lamb[OKAY] ............. [NO]fused_lambfused_lamb .................... [OKAY].............[NO] sparse_attn [NO]................... .......[OKAY][NO] [OKAY]....... [OKAY] transformersparse_attn ........................ [NO][NO]sparse_attn sparse_attn.......................... [OKAY] ............ [NO] [OKAY] [NO] transformer....... ...................[OKAY] stochastic_transformer[NO] [OKAY] .......transformer .transformer [OKAY] ............ ............[NO] [NO][NO] ....... ..............stochastic_transformer[OKAY] [OKAY][OKAY] . [NO] stochastic_transformer.......stochastic_transformer [OKAY]. . [NO][NO] .............. [OKAY][OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version .................... torch version1.8.1 .................... torch cuda version1.8.1 ............... torch cuda version11.1 ...............nvcc version 11.1..................... nvcc version11.2 ..................... deepspeed install path11.2 ........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 8 > setting random seeds to 42 ... [2021-09-25 02:35:45,964] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' >>> done with dataset index builder. Compilation time: 0.305 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 20.734 seconds time to initialize megatron (seconds): -8.955 [after megatron is initialized] datetime: 2021-09-25 02:36:07 building GPT model ... [2021-09-25 02:36:07,098] [INFO] [utils.py:680:see_memory_usage] Before Building Model /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved warnings.warn( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved warnings.warn( [2021-09-25 02:36:07,100] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-09-25 02:36:07,101] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 36.67 GB, percent = 19.6% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=0, data=4, model=0): 16, ProcessCoord(pipe=0, data=4, model=1): 17, ProcessCoord(pipe=0, data=4, model=2): 18, ProcessCoord(pipe=0, data=4, model=3): 19, ProcessCoord(pipe=0, data=5, model=0): 20, ProcessCoord(pipe=0, data=5, model=1): 21, ProcessCoord(pipe=0, data=5, model=2): 22, ProcessCoord(pipe=0, data=5, model=3): 23, ProcessCoord(pipe=0, data=6, model=0): 24, ProcessCoord(pipe=0, data=6, model=1): 25, ProcessCoord(pipe=0, data=6, model=2): 26, ProcessCoord(pipe=0, data=6, model=3): 27, ProcessCoord(pipe=0, data=7, model=0): 28, ProcessCoord(pipe=0, data=7, model=1): 29, ProcessCoord(pipe=0, data=7, model=2): 30, ProcessCoord(pipe=0, data=7, model=3): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=0, model=1): 33, ProcessCoord(pipe=1, data=0, model=2): 34, ProcessCoord(pipe=1, data=0, model=3): 35, ProcessCoord(pipe=1, data=1, model=0): 36, ProcessCoord(pipe=1, data=1, model=1): 37, ProcessCoord(pipe=1, data=1, model=2): 38, ProcessCoord(pipe=1, data=1, model=3): 39, ProcessCoord(pipe=1, data=2, model=0): 40, ProcessCoord(pipe=1, data=2, model=1): 41, ProcessCoord(pipe=1, data=2, model=2): 42, ProcessCoord(pipe=1, data=2, model=3): 43, ProcessCoord(pipe=1, data=3, model=0): 44, ProcessCoord(pipe=1, data=3, model=1): 45, ProcessCoord(pipe=1, data=3, model=2): 46, ProcessCoord(pipe=1, data=3, model=3): 47, ProcessCoord(pipe=1, data=4, model=0): 48, ProcessCoord(pipe=1, data=4, model=1): 49, ProcessCoord(pipe=1, data=4, model=2): 50, ProcessCoord(pipe=1, data=4, model=3): 51, ProcessCoord(pipe=1, data=5, model=0): 52, ProcessCoord(pipe=1, data=5, model=1): 53, ProcessCoord(pipe=1, data=5, model=2): 54, ProcessCoord(pipe=1, data=5, model=3): 55, ProcessCoord(pipe=1, data=6, model=0): 56, ProcessCoord(pipe=1, data=6, model=1): 57, ProcessCoord(pipe=1, data=6, model=2): 58, ProcessCoord(pipe=1, data=6, model=3): 59, ProcessCoord(pipe=1, data=7, model=0): 60, ProcessCoord(pipe=1, data=7, model=1): 61, ProcessCoord(pipe=1, data=7, model=2): 62, ProcessCoord(pipe=1, data=7, model=3): 63, ProcessCoord(pipe=2, data=0, model=0): 64, ProcessCoord(pipe=2, data=0, model=1): 65, ProcessCoord(pipe=2, data=0, model=2): 66, ProcessCoord(pipe=2, data=0, model=3): 67, ProcessCoord(pipe=2, data=1, model=0): 68, ProcessCoord(pipe=2, data=1, model=1): 69, ProcessCoord(pipe=2, data=1, model=2): 70, ProcessCoord(pipe=2, data=1, model=3): 71, ProcessCoord(pipe=2, data=2, model=0): 72, ProcessCoord(pipe=2, data=2, model=1): 73, ProcessCoord(pipe=2, data=2, model=2): 74, ProcessCoord(pipe=2, data=2, model=3): 75, ProcessCoord(pipe=2, data=3, model=0): 76, ProcessCoord(pipe=2, data=3, model=1): 77, ProcessCoord(pipe=2, data=3, model=2): 78, ProcessCoord(pipe=2, data=3, model=3): 79, ProcessCoord(pipe=2, data=4, model=0): 80, ProcessCoord(pipe=2, data=4, model=1): 81, ProcessCoord(pipe=2, data=4, model=2): 82, ProcessCoord(pipe=2, data=4, model=3): 83, ProcessCoord(pipe=2, data=5, model=0): 84, ProcessCoord(pipe=2, data=5, model=1): 85, ProcessCoord(pipe=2, data=5, model=2): 86, ProcessCoord(pipe=2, data=5, model=3): 87, ProcessCoord(pipe=2, data=6, model=0): 88, ProcessCoord(pipe=2, data=6, model=1): 89, ProcessCoord(pipe=2, data=6, model=2): 90, ProcessCoord(pipe=2, data=6, model=3): 91, ProcessCoord(pipe=2, data=7, model=0): 92, ProcessCoord(pipe=2, data=7, model=1): 93, ProcessCoord(pipe=2, data=7, model=2): 94, ProcessCoord(pipe=2, data=7, model=3): 95, ProcessCoord(pipe=3, data=0, model=0): 96, ProcessCoord(pipe=3, data=0, model=1): 97, ProcessCoord(pipe=3, data=0, model=2): 98, ProcessCoord(pipe=3, data=0, model=3): 99, ProcessCoord(pipe=3, data=1, model=0): 100, ProcessCoord(pipe=3, data=1, model=1): 101, ProcessCoord(pipe=3, data=1, model=2): 102, ProcessCoord(pipe=3, data=1, model=3): 103, ProcessCoord(pipe=3, data=2, model=0): 104, ProcessCoord(pipe=3, data=2, model=1): 105, ProcessCoord(pipe=3, data=2, model=2): 106, ProcessCoord(pipe=3, data=2, model=3): 107, ProcessCoord(pipe=3, data=3, model=0): 108, ProcessCoord(pipe=3, data=3, model=1): 109, ProcessCoord(pipe=3, data=3, model=2): 110, ProcessCoord(pipe=3, data=3, model=3): 111, ProcessCoord(pipe=3, data=4, model=0): 112, ProcessCoord(pipe=3, data=4, model=1): 113, ProcessCoord(pipe=3, data=4, model=2): 114, ProcessCoord(pipe=3, data=4, model=3): 115, ProcessCoord(pipe=3, data=5, model=0): 116, ProcessCoord(pipe=3, data=5, model=1): 117, ProcessCoord(pipe=3, data=5, model=2): 118, ProcessCoord(pipe=3, data=5, model=3): 119, ProcessCoord(pipe=3, data=6, model=0): 120, ProcessCoord(pipe=3, data=6, model=1): 121, ProcessCoord(pipe=3, data=6, model=2): 122, ProcessCoord(pipe=3, data=6, model=3): 123, ProcessCoord(pipe=3, data=7, model=0): 124, ProcessCoord(pipe=3, data=7, model=1): 125, ProcessCoord(pipe=3, data=7, model=2): 126, ProcessCoord(pipe=3, data=7, model=3): 127, ProcessCoord(pipe=4, data=0, model=0): 128, ProcessCoord(pipe=4, data=0, model=1): 129, ProcessCoord(pipe=4, data=0, model=2): 130, ProcessCoord(pipe=4, data=0, model=3): 131, ProcessCoord(pipe=4, data=1, model=0): 132, ProcessCoord(pipe=4, data=1, model=1): 133, ProcessCoord(pipe=4, data=1, model=2): 134, ProcessCoord(pipe=4, data=1, model=3): 135, ProcessCoord(pipe=4, data=2, model=0): 136, ProcessCoord(pipe=4, data=2, model=1): 137, ProcessCoord(pipe=4, data=2, model=2): 138, ProcessCoord(pipe=4, data=2, model=3): 139, ProcessCoord(pipe=4, data=3, model=0): 140, ProcessCoord(pipe=4, data=3, model=1): 141, ProcessCoord(pipe=4, data=3, model=2): 142, ProcessCoord(pipe=4, data=3, model=3): 143, ProcessCoord(pipe=4, data=4, model=0): 144, ProcessCoord(pipe=4, data=4, model=1): 145, ProcessCoord(pipe=4, data=4, model=2): 146, ProcessCoord(pipe=4, data=4, model=3): 147, ProcessCoord(pipe=4, data=5, model=0): 148, ProcessCoord(pipe=4, data=5, model=1): 149, ProcessCoord(pipe=4, data=5, model=2): 150, ProcessCoord(pipe=4, data=5, model=3): 151, ProcessCoord(pipe=4, data=6, model=0): 152, ProcessCoord(pipe=4, data=6, model=1): 153, ProcessCoord(pipe=4, data=6, model=2): 154, ProcessCoord(pipe=4, data=6, model=3): 155, ProcessCoord(pipe=4, data=7, model=0): 156, ProcessCoord(pipe=4, data=7, model=1): 157, ProcessCoord(pipe=4, data=7, model=2): 158, ProcessCoord(pipe=4, data=7, model=3): 159, ProcessCoord(pipe=5, data=0, model=0): 160, ProcessCoord(pipe=5, data=0, model=1): 161, ProcessCoord(pipe=5, data=0, model=2): 162, ProcessCoord(pipe=5, data=0, model=3): 163, ProcessCoord(pipe=5, data=1, model=0): 164, ProcessCoord(pipe=5, data=1, model=1): 165, ProcessCoord(pipe=5, data=1, model=2): 166, ProcessCoord(pipe=5, data=1, model=3): 167, ProcessCoord(pipe=5, data=2, model=0): 168, ProcessCoord(pipe=5, data=2, model=1): 169, ProcessCoord(pipe=5, data=2, model=2): 170, ProcessCoord(pipe=5, data=2, model=3): 171, ProcessCoord(pipe=5, data=3, model=0): 172, ProcessCoord(pipe=5, data=3, model=1): 173, ProcessCoord(pipe=5, data=3, model=2): 174, ProcessCoord(pipe=5, data=3, model=3): 175, ProcessCoord(pipe=5, data=4, model=0): 176, ProcessCoord(pipe=5, data=4, model=1): 177, ProcessCoord(pipe=5, data=4, model=2): 178, ProcessCoord(pipe=5, data=4, model=3): 179, ProcessCoord(pipe=5, data=5, model=0): 180, ProcessCoord(pipe=5, data=5, model=1): 181, ProcessCoord(pipe=5, data=5, model=2): 182, ProcessCoord(pipe=5, data=5, model=3): 183, ProcessCoord(pipe=5, data=6, model=0): 184, ProcessCoord(pipe=5, data=6, model=1): 185, ProcessCoord(pipe=5, data=6, model=2): 186, ProcessCoord(pipe=5, data=6, model=3): 187, ProcessCoord(pipe=5, data=7, model=0): 188, ProcessCoord(pipe=5, data=7, model=1): 189, ProcessCoord(pipe=5, data=7, model=2): 190, ProcessCoord(pipe=5, data=7, model=3): 191, ProcessCoord(pipe=6, data=0, model=0): 192, ProcessCoord(pipe=6, data=0, model=1): 193, ProcessCoord(pipe=6, data=0, model=2): 194, ProcessCoord(pipe=6, data=0, model=3): 195, ProcessCoord(pipe=6, data=1, model=0): 196, ProcessCoord(pipe=6, data=1, model=1): 197, ProcessCoord(pipe=6, data=1, model=2): 198, ProcessCoord(pipe=6, data=1, model=3): 199, ProcessCoord(pipe=6, data=2, model=0): 200, ProcessCoord(pipe=6, data=2, model=1): 201, ProcessCoord(pipe=6, data=2, model=2): 202, ProcessCoord(pipe=6, data=2, model=3): 203, ProcessCoord(pipe=6, data=3, model=0): 204, ProcessCoord(pipe=6, data=3, model=1): 205, ProcessCoord(pipe=6, data=3, model=2): 206, ProcessCoord(pipe=6, data=3, model=3): 207, ProcessCoord(pipe=6, data=4, model=0): 208, ProcessCoord(pipe=6, data=4, model=1): 209, ProcessCoord(pipe=6, data=4, model=2): 210, ProcessCoord(pipe=6, data=4, model=3): 211, ProcessCoord(pipe=6, data=5, model=0): 212, ProcessCoord(pipe=6, data=5, model=1): 213, ProcessCoord(pipe=6, data=5, model=2): 214, ProcessCoord(pipe=6, data=5, model=3): 215, ProcessCoord(pipe=6, data=6, model=0): 216, ProcessCoord(pipe=6, data=6, model=1): 217, ProcessCoord(pipe=6, data=6, model=2): 218, ProcessCoord(pipe=6, data=6, model=3): 219, ProcessCoord(pipe=6, data=7, model=0): 220, ProcessCoord(pipe=6, data=7, model=1): 221, ProcessCoord(pipe=6, data=7, model=2): 222, ProcessCoord(pipe=6, data=7, model=3): 223, ProcessCoord(pipe=7, data=0, model=0): 224, ProcessCoord(pipe=7, data=0, model=1): 225, ProcessCoord(pipe=7, data=0, model=2): 226, ProcessCoord(pipe=7, data=0, model=3): 227, ProcessCoord(pipe=7, data=1, model=0): 228, ProcessCoord(pipe=7, data=1, model=1): 229, ProcessCoord(pipe=7, data=1, model=2): 230, ProcessCoord(pipe=7, data=1, model=3): 231, ProcessCoord(pipe=7, data=2, model=0): 232, ProcessCoord(pipe=7, data=2, model=1): 233, ProcessCoord(pipe=7, data=2, model=2): 234, ProcessCoord(pipe=7, data=2, model=3): 235, ProcessCoord(pipe=7, data=3, model=0): 236, ProcessCoord(pipe=7, data=3, model=1): 237, ProcessCoord(pipe=7, data=3, model=2): 238, ProcessCoord(pipe=7, data=3, model=3): 239, ProcessCoord(pipe=7, data=4, model=0): 240, ProcessCoord(pipe=7, data=4, model=1): 241, ProcessCoord(pipe=7, data=4, model=2): 242, ProcessCoord(pipe=7, data=4, model=3): 243, ProcessCoord(pipe=7, data=5, model=0): 244, ProcessCoord(pipe=7, data=5, model=1): 245, ProcessCoord(pipe=7, data=5, model=2): 246, ProcessCoord(pipe=7, data=5, model=3): 247, ProcessCoord(pipe=7, data=6, model=0): 248, ProcessCoord(pipe=7, data=6, model=1): 249, ProcessCoord(pipe=7, data=6, model=2): 250, ProcessCoord(pipe=7, data=6, model=3): 251, ProcessCoord(pipe=7, data=7, model=0): 252, ProcessCoord(pipe=7, data=7, model=1): 253, ProcessCoord(pipe=7, data=7, model=2): 254, ProcessCoord(pipe=7, data=7, model=3): 255} [2021-09-25 02:36:08,503] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=7 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe stage=1 layers=4 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe stage=2 layers=4 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=3 layers=4 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe stage=4 layers=4 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe stage=5 layers=4 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe stage=6 layers=4 27: ParallelTransformerLayerPipe 28: ParallelTransformerLayerPipe 29: ParallelTransformerLayerPipe 30: ParallelTransformerLayerPipe stage=7 layers=8 31: ParallelTransformerLayerPipe 32: ParallelTransformerLayerPipe 33: ParallelTransformerLayerPipe 34: ParallelTransformerLayerPipe 35: 36: MixedFusedLayerNorm 37: EmbeddingPipe 38: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 1986498560 [2021-09-25 02:36:09,735] [INFO] [utils.py:680:see_memory_usage] After Building Model [2021-09-25 02:36:09,736] [INFO] [utils.py:681:see_memory_usage] MA 3.77 GB Max_MA 3.79 GB CA 3.79 GB Max_CA 4 GB [2021-09-25 02:36:09,737] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 36.85 GB, percent = 19.7% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 1986465792 setting training iterations to 159576 > learning rate decay style: cosine DeepSpeed is enabled. [2021-09-25 02:36:09,793] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+bc17042, git-hash=bc17042, git-branch=big-science [2021-09-25 02:36:09,889] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-09-25 02:36:09,889] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-09-25 02:36:09,890] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer [2021-09-25 02:36:09,890] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-09-25 02:36:09,890] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-09-25 02:36:09,890] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-09-25 02:36:09,890] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-09-25 02:36:09,890] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-09-25 02:36:09,890] [INFO] [stage2.py:108:__init__] CPU Offload: False [2021-09-25 02:36:09,890] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False [2021-09-25 02:36:14,495] [INFO] [stage2.py:419:__init__] optimizer state initialized [2021-09-25 02:36:14,495] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-09-25 02:36:14,495] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-09-25 02:36:14,495] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-09-25 02:36:14,495] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-09-25 02:36:14,495] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-09-25 02:36:14,495] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-09-25 02:36:14,495] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-09-25 02:36:14,495] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-09-25 02:36:14,495] [INFO] [config.py:904:print] amp_enabled .................. False [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] amp_params ................... False [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] disable_allgather ............ False [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] dump_state ................... False [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] global_rank .................. 0 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] gradient_accumulation_steps .. 256 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] gradient_clipping ............ 1.0 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4096 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] optimizer_name ............... None [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] optimizer_params ............. None [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] pld_enabled .................. False [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] pld_params ................... False [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-09-25 02:36:14,496] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] scheduler_name ............... None [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] scheduler_params ............. None [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] sparse_attention ............. None [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] steps_per_print .............. 2000 [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] train_batch_size ............. 2048 [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 1 [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] world_size ................... 8 [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] zero_allow_untested_optimizer False [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] zero_config .................. { "stage": 1, "contiguous_gradients": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] zero_enabled ................. True [2021-09-25 02:36:14,497] [INFO] [config.py:904:print] zero_optimization_stage ...... 1 [2021-09-25 02:36:14,497] [INFO] [config.py:906:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 2.048000e+03, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-09-25 02:36:14,497] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=256 micro_batch_size=1 [2021-09-25 02:36:15,038] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,038] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,038] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,038] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=67 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=64 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=65 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=66 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=193 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=194 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=195 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=129 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=130 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=128 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=227 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=224 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=225 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=226 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=98 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=160 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=163 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=161 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=162 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=192 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=131 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=34 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=32 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=35 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=33 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=97 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=99 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 02:36:15,039] [INFO] [engine.py:134:__init__] RANK=96 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) > using checkpoint value 6e-05 for learning rate > using checkpoint value 6e-06 for minimum learning rate > using checkpoint value 216320 for warmup iterations > using checkpoint value 126953125 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 8 ZeRO state_dicts for rank 180 successfully loaded 8 ZeRO state_dicts for rank 108 successfully loaded 8 ZeRO state_dicts for rank 206 successfully loaded 8 ZeRO state_dicts for rank 168 successfully loaded 8 ZeRO state_dicts for rank 167 successfully loaded 8 ZeRO state_dicts for rank 183 successfully loaded 8 ZeRO state_dicts for rank 112 successfully loaded 8 ZeRO state_dicts for rank 60 successfully loaded 8 ZeRO state_dicts for rank 56 successfully loaded 8 ZeRO state_dicts for rank 63 successfully loaded 8 ZeRO state_dicts for rank 222 successfully loaded 8 ZeRO state_dicts for rank 52 successfully loaded 8 ZeRO state_dicts for rank 177 successfully loaded 8 ZeRO state_dicts for rank 104 successfully loaded 8 ZeRO state_dicts for rank 164 successfully loaded 8 ZeRO state_dicts for rank 176 successfully loaded 8 ZeRO state_dicts for rank 110 successfully loaded 8 ZeRO state_dicts for rank 58 successfully loaded 8 ZeRO state_dicts for rank 178 successfully loaded 8 ZeRO state_dicts for rank 184 successfully loaded 8 ZeRO state_dicts for rank 116 successfully loaded 8 ZeRO state_dicts for rank 127 successfully loaded 8 ZeRO state_dicts for rank 96 successfully loaded 8 ZeRO state_dicts for rank 172 successfully loaded 8 ZeRO state_dicts for rank 188 successfully loaded 8 ZeRO state_dicts for rank 61 successfully loaded 8 ZeRO state_dicts for rank 182 successfully loaded 8 ZeRO state_dicts for rank 204 successfully loaded 8 ZeRO state_dicts for rank 62 successfully loaded 8 ZeRO state_dicts for rank 170 successfully loaded 8 ZeRO state_dicts for rank 124 successfully loaded 8 ZeRO state_dicts for rank 109 successfully loaded 8 ZeRO state_dicts for rank 44 successfully loaded 8 ZeRO state_dicts for rank 166 successfully loaded 8 ZeRO state_dicts for rank 59 successfully loaded 8 ZeRO state_dicts for rank 113 successfully loaded 8 ZeRO state_dicts for rank 200 successfully loaded 8 ZeRO state_dicts for rank 185 successfully loaded 8 ZeRO state_dicts for rank 15 successfully loaded 8 ZeRO state_dicts for rank 214 successfully loaded 8 ZeRO state_dicts for rank 143 successfully loaded 8 ZeRO state_dicts for rank 171 successfully loaded 8 ZeRO state_dicts for rank 169 successfully loaded 8 ZeRO state_dicts for rank 20 successfully loaded 8 ZeRO state_dicts for rank 198 successfully loaded 8 ZeRO state_dicts for rank 161 successfully loaded 8 ZeRO state_dicts for rank 57 successfully loaded 8 ZeRO state_dicts for rank 220 successfully loaded 8 ZeRO state_dicts for rank 158 successfully loaded 8 ZeRO state_dicts for rank 81 successfully loaded 8 ZeRO state_dicts for rank 111 successfully loaded 8 ZeRO state_dicts for rank 120 successfully loaded 8 ZeRO state_dicts for rank 211 successfully loaded 8 ZeRO state_dicts for rank 221 successfully loaded 8 ZeRO state_dicts for rank 16 successfully loaded 8 ZeRO state_dicts for rank 186 successfully loaded 8 ZeRO state_dicts for rank 223 successfully loaded 8 ZeRO state_dicts for rank 93 successfully loaded 8 ZeRO state_dicts for rank 95 successfully loaded 8 ZeRO state_dicts for rank 105 successfully loaded 8 ZeRO state_dicts for rank 21 successfully loaded 8 ZeRO state_dicts for rank 207 successfully loaded 8 ZeRO state_dicts for rank 107 successfully loaded 8 ZeRO state_dicts for rank 194 successfully loaded 8 ZeRO state_dicts for rank 142 successfully loaded 8 ZeRO state_dicts for rank 51 successfully loaded 8 ZeRO state_dicts for rank 209 successfully loaded 8 ZeRO state_dicts for rank 128 successfully loaded 8 ZeRO state_dicts for rank 160 successfully loaded 8 ZeRO state_dicts for rank 83 successfully loaded 8 ZeRO state_dicts for rank 97 successfully loaded 8 ZeRO state_dicts for rank 76 successfully loaded 8 ZeRO state_dicts for rank 135 successfully loaded 8 ZeRO state_dicts for rank 100 successfully loaded 8 ZeRO state_dicts for rank 174 successfully loaded 8 ZeRO state_dicts for rank 23 successfully loaded 8 ZeRO state_dicts for rank 121 successfully loaded 8 ZeRO state_dicts for rank 80 successfully loaded 8 ZeRO state_dicts for rank 75 successfully loaded 8 ZeRO state_dicts for rank 140 successfully loaded 8 ZeRO state_dicts for rank 205 loading 8 zero partition checkpoints for rank 180 successfully loaded 8 ZeRO state_dicts for rank 190 successfully loaded 8 ZeRO state_dicts for rank 215 successfully loaded 8 ZeRO state_dicts for rank 48 successfully loaded 8 ZeRO state_dicts for rank 202 successfully loaded 8 ZeRO state_dicts for rank 196 loading 8 zero partition checkpoints for rank 206 successfully loaded 8 ZeRO state_dicts for rank 165 loading 8 zero partition checkpoints for rank 108 successfully loaded 8 ZeRO state_dicts for rank 179 successfully loaded 8 ZeRO state_dicts for rank 175 successfully loaded 8 ZeRO state_dicts for rank 187 successfully loaded 8 ZeRO state_dicts for rank 126 successfully loaded 8 ZeRO state_dicts for rank 13 successfully loaded 8 ZeRO state_dicts for rank 36 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 02:36:41 CEST)" was missed by 0:00:03.258297 successfully loaded 8 ZeRO state_dicts for rank 199 successfully loaded 8 ZeRO state_dicts for rank 55 successfully loaded 8 ZeRO state_dicts for rank 99 successfully loaded 8 ZeRO state_dicts for rank 115 successfully loaded 8 ZeRO state_dicts for rank 72 successfully loaded 8 ZeRO state_dicts for rank 162 successfully loaded 8 ZeRO state_dicts for rank 203 successfully loaded 8 ZeRO state_dicts for rank 22 successfully loaded 8 ZeRO state_dicts for rank 210 loading 8 zero partition checkpoints for rank 183 successfully loaded 8 ZeRO state_dicts for rank 82 successfully loaded 8 ZeRO state_dicts for rank 35 successfully loaded 8 ZeRO state_dicts for rank 129 successfully loaded 8 ZeRO state_dicts for rank 131 successfully loaded 8 ZeRO state_dicts for rank 192 successfully loaded 8 ZeRO state_dicts for rank 130 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 02:36:42 CEST)" was missed by 0:00:03.400033 successfully loaded 8 ZeRO state_dicts for rank 156 successfully loaded 8 ZeRO state_dicts for rank 157 successfully loaded 8 ZeRO state_dicts for rank 208 successfully loaded 8 ZeRO state_dicts for rank 141 successfully loaded 8 ZeRO state_dicts for rank 181 successfully loaded 8 ZeRO state_dicts for rank 92 loading 8 zero partition checkpoints for rank 167 successfully loaded 8 ZeRO state_dicts for rank 12 successfully loaded 8 ZeRO state_dicts for rank 18 successfully loaded 8 ZeRO state_dicts for rank 118 successfully loaded 8 ZeRO state_dicts for rank 19 successfully loaded 8 ZeRO state_dicts for rank 32 successfully loaded 8 ZeRO state_dicts for rank 173 successfully loaded 8 ZeRO state_dicts for rank 236 successfully loaded 8 ZeRO state_dicts for rank 224 successfully loaded 8 ZeRO state_dicts for rank 132 successfully loaded 8 ZeRO state_dicts for rank 195 successfully loaded 8 ZeRO state_dicts for rank 69 successfully loaded 8 ZeRO state_dicts for rank 65 successfully loaded 8 ZeRO state_dicts for rank 41 successfully loaded 8 ZeRO state_dicts for rank 189 successfully loaded 8 ZeRO state_dicts for rank 71 successfully loaded 8 ZeRO state_dicts for rank 79 successfully loaded 8 ZeRO state_dicts for rank 87 successfully loaded 8 ZeRO state_dicts for rank 138 successfully loaded 8 ZeRO state_dicts for rank 212 successfully loaded 8 ZeRO state_dicts for rank 197 successfully loaded 8 ZeRO state_dicts for rank 8 successfully loaded 8 ZeRO state_dicts for rank 134 successfully loaded 8 ZeRO state_dicts for rank 14 successfully loaded 8 ZeRO state_dicts for rank 39 successfully loaded 8 ZeRO state_dicts for rank 201 successfully loaded 8 ZeRO state_dicts for rank 88 successfully loaded 8 ZeRO state_dicts for rank 125 successfully loaded 8 ZeRO state_dicts for rank 91 successfully loaded 8 ZeRO state_dicts for rank 163 successfully loaded 8 ZeRO state_dicts for rank 114 successfully loaded 8 ZeRO state_dicts for rank 237 successfully loaded 8 ZeRO state_dicts for rank 45 successfully loaded 8 ZeRO state_dicts for rank 193 successfully loaded 8 ZeRO state_dicts for rank 106 loading 8 zero partition checkpoints for rank 56 successfully loaded 8 ZeRO state_dicts for rank 218 successfully loaded 8 ZeRO state_dicts for rank 243 successfully loaded 8 ZeRO state_dicts for rank 25 successfully loaded 8 ZeRO state_dicts for rank 98 successfully loaded 8 ZeRO state_dicts for rank 245 loading 8 zero partition checkpoints for rank 112 successfully loaded 8 ZeRO state_dicts for rank 240 successfully loaded 8 ZeRO state_dicts for rank 213 loading 8 zero partition checkpoints for rank 60 successfully loaded 8 ZeRO state_dicts for rank 103 successfully loaded 8 ZeRO state_dicts for rank 0 successfully loaded 8 ZeRO state_dicts for rank 191 successfully loaded 8 ZeRO state_dicts for rank 149 successfully loaded 8 ZeRO state_dicts for rank 252 successfully loaded 8 ZeRO state_dicts for rank 67 successfully loaded 8 ZeRO state_dicts for rank 136 successfully loaded 8 ZeRO state_dicts for rank 49 successfully loaded 8 ZeRO state_dicts for rank 54 successfully loaded 8 ZeRO state_dicts for rank 119 successfully loaded 8 ZeRO state_dicts for rank 77 successfully loaded 8 ZeRO state_dicts for rank 73 loading 8 zero partition checkpoints for rank 168 successfully loaded 8 ZeRO state_dicts for rank 238 successfully loaded 8 ZeRO state_dicts for rank 139 successfully loaded 8 ZeRO state_dicts for rank 159 successfully loaded 8 ZeRO state_dicts for rank 94 successfully loaded 8 ZeRO state_dicts for rank 147 successfully loaded 8 ZeRO state_dicts for rank 3 successfully loaded 8 ZeRO state_dicts for rank 27 successfully loaded 8 ZeRO state_dicts for rank 233 successfully loaded 8 ZeRO state_dicts for rank 84 successfully loaded 8 ZeRO state_dicts for rank 17 successfully loaded 8 ZeRO state_dicts for rank 117 successfully loaded 8 ZeRO state_dicts for rank 137 successfully loaded 8 ZeRO state_dicts for rank 144 successfully loaded 8 ZeRO state_dicts for rank 133 successfully loaded 8 ZeRO state_dicts for rank 11 successfully loaded 8 ZeRO state_dicts for rank 101 successfully loaded 8 ZeRO state_dicts for rank 248 successfully loaded 8 ZeRO state_dicts for rank 90 successfully loaded 8 ZeRO state_dicts for rank 40 successfully loaded 8 ZeRO state_dicts for rank 46 successfully loaded 8 ZeRO state_dicts for rank 47 successfully loaded 8 ZeRO state_dicts for rank 78 successfully loaded 8 ZeRO state_dicts for rank 242 loading 8 zero partition checkpoints for rank 52 successfully loaded 8 ZeRO state_dicts for rank 239 successfully loaded 8 ZeRO state_dicts for rank 9 successfully loaded 8 ZeRO state_dicts for rank 74 successfully loaded 8 ZeRO state_dicts for rank 225 successfully loaded 8 ZeRO state_dicts for rank 68 successfully loaded 8 ZeRO state_dicts for rank 146 loading 8 zero partition checkpoints for rank 104 successfully loaded 8 ZeRO state_dicts for rank 122 successfully loaded 8 ZeRO state_dicts for rank 2 successfully loaded 8 ZeRO state_dicts for rank 123 successfully loaded 8 ZeRO state_dicts for rank 37 successfully loaded 8 ZeRO state_dicts for rank 53 loading 8 zero partition checkpoints for rank 176 successfully loaded 8 ZeRO state_dicts for rank 150 successfully loaded 8 ZeRO state_dicts for rank 28 successfully loaded 8 ZeRO state_dicts for rank 86 successfully loaded 8 ZeRO state_dicts for rank 234 successfully loaded 8 ZeRO state_dicts for rank 244 successfully loaded 8 ZeRO state_dicts for rank 226 loading 8 zero partition checkpoints for rank 110 successfully loaded 8 ZeRO state_dicts for rank 145 successfully loaded 8 ZeRO state_dicts for rank 228 successfully loaded 8 ZeRO state_dicts for rank 217 successfully loaded 8 ZeRO state_dicts for rank 216 successfully loaded 8 ZeRO state_dicts for rank 152 loading 8 zero partition checkpoints for rank 184 successfully loaded 8 ZeRO state_dicts for rank 227 successfully loaded 8 ZeRO state_dicts for rank 85 successfully loaded 8 ZeRO state_dicts for rank 154 loading 8 zero partition checkpoints for rank 127 successfully loaded 8 ZeRO state_dicts for rank 24 successfully loaded 8 ZeRO state_dicts for rank 241 loading 8 zero partition checkpoints for rank 96 successfully loaded 8 ZeRO state_dicts for rank 1 successfully loaded 8 ZeRO state_dicts for rank 64 successfully loaded 8 ZeRO state_dicts for rank 50 successfully loaded 8 ZeRO state_dicts for rank 42 successfully loaded 8 ZeRO state_dicts for rank 232 loading 8 zero partition checkpoints for rank 116 loading 8 zero partition checkpoints for rank 172 successfully loaded 8 ZeRO state_dicts for rank 33 successfully loaded 8 ZeRO state_dicts for rank 10 successfully loaded 8 ZeRO state_dicts for rank 31 successfully loaded 8 ZeRO state_dicts for rank 38 loading 8 zero partition checkpoints for rank 63 successfully loaded 8 ZeRO state_dicts for rank 89 successfully loaded 8 ZeRO state_dicts for rank 249 successfully loaded 8 ZeRO state_dicts for rank 246 loading 8 zero partition checkpoints for rank 58 successfully loaded 8 ZeRO state_dicts for rank 151 loading 8 zero partition checkpoints for rank 204 successfully loaded 8 ZeRO state_dicts for rank 155 successfully loaded 8 ZeRO state_dicts for rank 34 successfully loaded 8 ZeRO state_dicts for rank 250 successfully loaded 8 ZeRO state_dicts for rank 102 successfully loaded 8 ZeRO state_dicts for rank 230 successfully loaded 8 ZeRO state_dicts for rank 70 successfully loaded 8 ZeRO state_dicts for rank 26 successfully loaded 8 ZeRO state_dicts for rank 29 loading 8 zero partition checkpoints for rank 62 loading 8 zero partition checkpoints for rank 182 loading 8 zero partition checkpoints for rank 124 loading 8 zero partition checkpoints for rank 109 successfully loaded 8 ZeRO state_dicts for rank 247 successfully loaded 8 ZeRO state_dicts for rank 148 successfully loaded 8 ZeRO state_dicts for rank 30 successfully loaded 8 ZeRO state_dicts for rank 153 loading 8 zero partition checkpoints for rank 113 successfully loaded 8 ZeRO state_dicts for rank 251 successfully loaded 8 ZeRO state_dicts for rank 43 successfully loaded 8 ZeRO state_dicts for rank 235 loading 8 zero partition checkpoints for rank 177 loading 8 zero partition checkpoints for rank 200 loading 8 zero partition checkpoints for rank 214 successfully loaded 8 ZeRO state_dicts for rank 254 successfully loaded 8 ZeRO state_dicts for rank 229 loading 8 zero partition checkpoints for rank 164 loading 8 zero partition checkpoints for rank 44 loading 8 zero partition checkpoints for rank 211 loading 8 zero partition checkpoints for rank 111 loading 8 zero partition checkpoints for rank 221 loading 8 zero partition checkpoints for rank 143 successfully loaded 8 ZeRO state_dicts for rank 66 loading 8 zero partition checkpoints for rank 188 loading 8 zero partition checkpoints for rank 194 loading 8 zero partition checkpoints for rank 81 loading 8 zero partition checkpoints for rank 15 successfully loaded 8 ZeRO state_dicts for rank 231 loading 8 zero partition checkpoints for rank 207 loading 8 zero partition checkpoints for rank 107 loading 8 zero partition checkpoints for rank 160 successfully loaded 8 ZeRO state_dicts for rank 253 loading 8 zero partition checkpoints for rank 105 loading 8 zero partition checkpoints for rank 186 loading 8 zero partition checkpoints for rank 223 loading 8 zero partition checkpoints for rank 95 loading 8 zero partition checkpoints for rank 174 loading 8 zero partition checkpoints for rank 51 loading 8 zero partition checkpoints for rank 61 loading 8 zero partition checkpoints for rank 120 loading 8 zero partition checkpoints for rank 135 loading 8 zero partition checkpoints for rank 97 loading 8 zero partition checkpoints for rank 140 loading 8 zero partition checkpoints for rank 16 loading 8 zero partition checkpoints for rank 198 loading 8 zero partition checkpoints for rank 100 loading 8 zero partition checkpoints for rank 171 loading 8 zero partition checkpoints for rank 205 loading 8 zero partition checkpoints for rank 76 successfully loaded 8 ZeRO state_dicts for rank 219 successfully loaded 8 ZeRO state_dicts for rank 255 loading 8 zero partition checkpoints for rank 20 loading 8 zero partition checkpoints for rank 126 loading 8 zero partition checkpoints for rank 55 loading 8 zero partition checkpoints for rank 175 loading 8 zero partition checkpoints for rank 99 loading 8 zero partition checkpoints for rank 36 loading 8 zero partition checkpoints for rank 199 loading 8 zero partition checkpoints for rank 166 loading 8 zero partition checkpoints for rank 158 loading 8 zero partition checkpoints for rank 157 loading 8 zero partition checkpoints for rank 82 loading 8 zero partition checkpoints for rank 129 loading 8 zero partition checkpoints for rank 222 loading 8 zero partition checkpoints for rank 215 loading 8 zero partition checkpoints for rank 121 loading 8 zero partition checkpoints for rank 115 loading 8 zero partition checkpoints for rank 181 loading 8 zero partition checkpoints for rank 134 loading 8 zero partition checkpoints for rank 21 loading 8 zero partition checkpoints for rank 87 loading 8 zero partition checkpoints for rank 201 loading 8 zero partition checkpoints for rank 197 loading 8 zero partition checkpoints for rank 13 loading 8 zero partition checkpoints for rank 173 loading 8 zero partition checkpoints for rank 132 loading 8 zero partition checkpoints for rank 195 loading 8 zero partition checkpoints for rank 178 loading 8 zero partition checkpoints for rank 69 loading 8 zero partition checkpoints for rank 65 loading 8 zero partition checkpoints for rank 125 loading 8 zero partition checkpoints for rank 138 loading 8 zero partition checkpoints for rank 208 loading 8 zero partition checkpoints for rank 45 loading 8 zero partition checkpoints for rank 39 loading 8 zero partition checkpoints for rank 196 loading 8 zero partition checkpoints for rank 130 loading 8 zero partition checkpoints for rank 35 loading 8 zero partition checkpoints for rank 165 loading 8 zero partition checkpoints for rank 209 loading 8 zero partition checkpoints for rank 213 loading 8 zero partition checkpoints for rank 190 loading 8 zero partition checkpoints for rank 189 loading 8 zero partition checkpoints for rank 114 loading 8 zero partition checkpoints for rank 12 loading 8 zero partition checkpoints for rank 54 loading 8 zero partition checkpoints for rank 98 loading 8 zero partition checkpoints for rank 49 loading 8 zero partition checkpoints for rank 142 loading 8 zero partition checkpoints for rank 136 loading 8 zero partition checkpoints for rank 19 loading 8 zero partition checkpoints for rank 163 loading 8 zero partition checkpoints for rank 159 loading 8 zero partition checkpoints for rank 94 loading 8 zero partition checkpoints for rank 88 loading 8 zero partition checkpoints for rank 67 loading 8 zero partition checkpoints for rank 106 loading 8 zero partition checkpoints for rank 149 loading 8 zero partition checkpoints for rank 73 loading 8 zero partition checkpoints for rank 218 loading 8 zero partition checkpoints for rank 59 loading 8 zero partition checkpoints for rank 139 loading 8 zero partition checkpoints for rank 137 loading 8 zero partition checkpoints for rank 212 loading 8 zero partition checkpoints for rank 14 loading 8 zero partition checkpoints for rank 144 loading 8 zero partition checkpoints for rank 57 loading 8 zero partition checkpoints for rank 191 loading 8 zero partition checkpoints for rank 23 loading 8 zero partition checkpoints for rank 133 loading 8 zero partition checkpoints for rank 117 loading 8 zero partition checkpoints for rank 220 loading 8 zero partition checkpoints for rank 40 loading 8 zero partition checkpoints for rank 122 loading 8 zero partition checkpoints for rank 179 loading 8 zero partition checkpoints for rank 78 loading 8 zero partition checkpoints for rank 83 loading 8 zero partition checkpoints for rank 150 loading 8 zero partition checkpoints for rank 156 loading 8 zero partition checkpoints for rank 245 loading 8 zero partition checkpoints for rank 141 loading 8 zero partition checkpoints for rank 210 loading 8 zero partition checkpoints for rank 123 loading 8 zero partition checkpoints for rank 37 loading 8 zero partition checkpoints for rank 243 loading 8 zero partition checkpoints for rank 74 loading 8 zero partition checkpoints for rank 68 loading 8 zero partition checkpoints for rank 217 loading 8 zero partition checkpoints for rank 185 loading 8 zero partition checkpoints for rank 237 loading 8 zero partition checkpoints for rank 192 loading 8 zero partition checkpoints for rank 161 loading 8 zero partition checkpoints for rank 90 loading 8 zero partition checkpoints for rank 80 loading 8 zero partition checkpoints for rank 3 loading 8 zero partition checkpoints for rank 93 loading 8 zero partition checkpoints for rank 53 loading 8 zero partition checkpoints for rank 101 loading 8 zero partition checkpoints for rank 118 loading 8 zero partition checkpoints for rank 71 loading 8 zero partition checkpoints for rank 33 loading 8 zero partition checkpoints for rank 9 loading 8 zero partition checkpoints for rank 239 loading 8 zero partition checkpoints for rank 48 loading 8 zero partition checkpoints for rank 86 loading 8 zero partition checkpoints for rank 187 loading 8 zero partition checkpoints for rank 64 loading 8 zero partition checkpoints for rank 170 loading 8 zero partition checkpoints for rank 11 loading 8 zero partition checkpoints for rank 145 loading 8 zero partition checkpoints for rank 38 loading 8 zero partition checkpoints for rank 152 loading 8 zero partition checkpoints for rank 22 loading 8 zero partition checkpoints for rank 155 loading 8 zero partition checkpoints for rank 2 loading 8 zero partition checkpoints for rank 226 loading 8 zero partition checkpoints for rank 244 loading 8 zero partition checkpoints for rank 75 loading 8 zero partition checkpoints for rank 79 loading 8 zero partition checkpoints for rank 84 loading 8 zero partition checkpoints for rank 193 loading 8 zero partition checkpoints for rank 227 loading 8 zero partition checkpoints for rank 46 loading 8 zero partition checkpoints for rank 47 loading 8 zero partition checkpoints for rank 119 loading 8 zero partition checkpoints for rank 234 loading 8 zero partition checkpoints for rank 24 loading 8 zero partition checkpoints for rank 92 loading 8 zero partition checkpoints for rank 169 loading 8 zero partition checkpoints for rank 43 loading 8 zero partition checkpoints for rank 18 loading 8 zero partition checkpoints for rank 241 loading 8 zero partition checkpoints for rank 128 loading 8 zero partition checkpoints for rank 77 loading 8 zero partition checkpoints for rank 162 loading 8 zero partition checkpoints for rank 246 loading 8 zero partition checkpoints for rank 151 loading 8 zero partition checkpoints for rank 72 loading 8 zero partition checkpoints for rank 41 loading 8 zero partition checkpoints for rank 91 loading 8 zero partition checkpoints for rank 26 loading 8 zero partition checkpoints for rank 147 loading 8 zero partition checkpoints for rank 224 loading 8 zero partition checkpoints for rank 50 loading 8 zero partition checkpoints for rank 216 loading 8 zero partition checkpoints for rank 85 loading 8 zero partition checkpoints for rank 148 loading 8 zero partition checkpoints for rank 131 loading 8 zero partition checkpoints for rank 32 loading 8 zero partition checkpoints for rank 247 loading 8 zero partition checkpoints for rank 8 loading 8 zero partition checkpoints for rank 66 loading 8 zero partition checkpoints for rank 229 loading 8 zero partition checkpoints for rank 146 loading 8 zero partition checkpoints for rank 235 loading 8 zero partition checkpoints for rank 17 loading 8 zero partition checkpoints for rank 236 loading 8 zero partition checkpoints for rank 254 loading 8 zero partition checkpoints for rank 202 loading 8 zero partition checkpoints for rank 70 loading 8 zero partition checkpoints for rank 42 loading 8 zero partition checkpoints for rank 250 loading 8 zero partition checkpoints for rank 89 loading 8 zero partition checkpoints for rank 251 loading 8 zero partition checkpoints for rank 228 loading 8 zero partition checkpoints for rank 103 loading 8 zero partition checkpoints for rank 225 loading 8 zero partition checkpoints for rank 34 loading 8 zero partition checkpoints for rank 203 loading 8 zero partition checkpoints for rank 231 loading 8 zero partition checkpoints for rank 25 loading 8 zero partition checkpoints for rank 238 loading 8 zero partition checkpoints for rank 255 loading 8 zero partition checkpoints for rank 102 loading 8 zero partition checkpoints for rank 154 loading 8 zero partition checkpoints for rank 219 loading 8 zero partition checkpoints for rank 230 loading 8 zero partition checkpoints for rank 240 loading 8 zero partition checkpoints for rank 10 loading 8 zero partition checkpoints for rank 27 loading 8 zero partition checkpoints for rank 0 checkpoint version 3.0 loading 8 zero partition checkpoints for rank 153 loading 8 zero partition checkpoints for rank 233 loading 8 zero partition checkpoints for rank 248 loading 8 zero partition checkpoints for rank 242 loading 8 zero partition checkpoints for rank 252 loading 8 zero partition checkpoints for rank 232 loading 8 zero partition checkpoints for rank 1 loading 8 zero partition checkpoints for rank 249 loading 8 zero partition checkpoints for rank 253 loading 8 zero partition checkpoints for rank 28 loading 8 zero partition checkpoints for rank 31 loading 8 zero partition checkpoints for rank 29 loading 8 zero partition checkpoints for rank 30 successfully loaded 8 ZeRO state_dicts for rank 5 loading 8 zero partition checkpoints for rank 5 successfully loaded 8 ZeRO state_dicts for rank 6 successfully loaded 8 ZeRO state_dicts for rank 4 successfully loaded 8 ZeRO state_dicts for rank 7 loading 8 zero partition checkpoints for rank 6 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 02:38:42 CEST)" was missed by 0:00:03.040753 loading 8 zero partition checkpoints for rank 4 loading 8 zero partition checkpoints for rank 7 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints at iteration 5827 time (ms) | load-checkpoint: 94708.03 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-25 02:37:49 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 300000000 validation: 1638400 test: 10240 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.199121 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.460 seconds total number of samples: 394611670 total number of epochs: 3 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.335 seconds total number of samples: 6927161 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.163 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-09-25 02:37:56 done with setup ... training ... time (ms) | model-and-optimizer-setup: 102787.57 | train/valid/test-data-iterators-setup: 6275.52 [before the start of training step] datetime: 2021-09-25 02:37:56 [2021-09-25 02:37:56,930] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information [2021-09-25 02:37:56,931] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-09-25 02:37:56,931] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 32 total layers [2021-09-25 02:37:56,931] [INFO] [checkpointing.py:415:forward] ----Synchronization False [2021-09-25 02:37:56,931] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False [Rank 1] (after 5830 iterations) memory (MB) | allocated: 6685.79931640625 | max allocated: 13590.94921875 | reserved: 22862.0 | max reserved: 22862.0 [Rank 225] (after 5830 iterations) memory (MB) | allocated: 7107.7109375 | max allocated: 11885.68701171875 | reserved: 22492.0 | max reserved: 22492.0 [Rank 2] (after 5830 iterations) memory (MB) | allocated: 6685.79931640625 | max allocated: 13590.94921875 | reserved: 22862.0 | max reserved: 22862.0 [Rank 226] (after 5830 iterations) memory (MB) | allocated: 7107.7109375 | max allocated: 11885.6865234375 | reserved: 20752.0 | max reserved: 20752.0 [Rank 224] (after 5830 iterations) memory (MB) | allocated: 7107.7109375 | max allocated: 11885.6875 | reserved: 22492.0 | max reserved: 22492.0 [Rank 0] (after 5830 iterations) memory (MB) | allocated: 6685.79931640625 | max allocated: 13590.94921875 | reserved: 23246.0 | max reserved: 23246.0 [Rank 3] (after 5830 iterations) memory (MB) | allocated: 6685.79931640625 | max allocated: 13590.94921875 | reserved: 22862.0 | max reserved: 22862.0 [Rank 227] (after 5830 iterations) memory (MB) | allocated: 7107.7109375 | max allocated: 11885.68701171875 | reserved: 22492.0 | max reserved: 22492.0 iteration 5830/ 159576 | consumed samples: 168368 | elapsed time per iteration (ms): 21875.4 | learning rate: 4.656E-05 | global batch size: 64 | lm loss: 6.454423E+00 | loss scale: 2048.0 | grad norm: 45630.759 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [Rank 65] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 19902.0 | max reserved: 19902.0 [Rank 33] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 19866.0 | max reserved: 19866.0 [Rank 97] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19402.0 | max reserved: 19402.0 [Rank 66] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 19890.0 | max reserved: 19890.0 [Rank 34] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20370.0 | max reserved: 20370.0 [Rank 193] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 19066.0 | max reserved: 19066.0 [Rank 161] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 19146.0 | max reserved: 19146.0 [Rank 129] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19582.0 | max reserved: 19582.0 [Rank 162] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 19066.0 | max reserved: 19066.0 [Rank 130] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19434.0 | max reserved: 19434.0 [Rank 98] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19674.0 | max reserved: 19674.0 [Rank 194] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 19066.0 | max reserved: 19066.0 [Rank 64] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 20536.0 | max reserved: 20536.0 [Rank 32] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20408.0 | max reserved: 20408.0 [Rank 99] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19838.0 | max reserved: 19838.0 [Rank 131] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19502.0 | max reserved: 19502.0 [Rank 67] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 19902.0 | max reserved: 19902.0 [Rank 35] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 19866.0 | max reserved: 19866.0 [Rank 192] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 19012.0 | max reserved: 19012.0 [Rank 128] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19908.0 | max reserved: 19908.0 [Rank 160] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 19636.0 | max reserved: 19636.0 [Rank 96] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19988.0 | max reserved: 19988.0 [Rank 163] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 19306.0 | max reserved: 19306.0 [Rank 195] (after 5830 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 18826.0 | max reserved: 18826.0 iteration 5840/ 159576 | consumed samples: 169008 | elapsed time per iteration (ms): 16822.3 | learning rate: 4.674E-05 | global batch size: 64 | lm loss: 6.392004E+00 | loss scale: 2048.0 | grad norm: 53106.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5850/ 159576 | consumed samples: 169648 | elapsed time per iteration (ms): 16813.6 | learning rate: 4.692E-05 | global batch size: 64 | lm loss: 6.347363E+00 | loss scale: 2048.0 | grad norm: 53512.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5860/ 159576 | consumed samples: 170288 | elapsed time per iteration (ms): 16773.5 | learning rate: 4.709E-05 | global batch size: 64 | lm loss: 6.368040E+00 | loss scale: 2048.0 | grad norm: 49687.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5870/ 159576 | consumed samples: 170928 | elapsed time per iteration (ms): 16844.9 | learning rate: 4.727E-05 | global batch size: 64 | lm loss: 6.372821E+00 | loss scale: 2048.0 | grad norm: 49107.159 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5880/ 159576 | consumed samples: 171568 | elapsed time per iteration (ms): 16812.2 | learning rate: 4.745E-05 | global batch size: 64 | lm loss: 6.379050E+00 | loss scale: 2048.0 | grad norm: 76898.126 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5890/ 159576 | consumed samples: 172208 | elapsed time per iteration (ms): 16819.7 | learning rate: 4.763E-05 | global batch size: 64 | lm loss: 6.333071E+00 | loss scale: 2048.0 | grad norm: 69874.656 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5900/ 159576 | consumed samples: 172848 | elapsed time per iteration (ms): 16821.3 | learning rate: 4.780E-05 | global batch size: 64 | lm loss: 6.354385E+00 | loss scale: 2048.0 | grad norm: 57915.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5910/ 159576 | consumed samples: 173488 | elapsed time per iteration (ms): 16679.9 | learning rate: 4.798E-05 | global batch size: 64 | lm loss: 6.361916E+00 | loss scale: 2048.0 | grad norm: 56535.869 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5920/ 159576 | consumed samples: 174128 | elapsed time per iteration (ms): 16731.8 | learning rate: 4.816E-05 | global batch size: 64 | lm loss: 6.371978E+00 | loss scale: 2048.0 | grad norm: 75613.913 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5930/ 159576 | consumed samples: 174768 | elapsed time per iteration (ms): 16796.3 | learning rate: 4.834E-05 | global batch size: 64 | lm loss: 6.373956E+00 | loss scale: 2048.0 | grad norm: 64436.905 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 03:08:32] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1185639_[1-10%1] on 'gpu_p13' partition) [2021-09-25 03:08:32] PULSE: tr8-104B is running for 33:04 since 2021-09-25T02:35:28 (1185609 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n[0,7-8]) iteration 5940/ 159576 | consumed samples: 175408 | elapsed time per iteration (ms): 16680.4 | learning rate: 4.851E-05 | global batch size: 64 | lm loss: 6.367229E+00 | loss scale: 2048.0 | grad norm: 61103.619 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5950/ 159576 | consumed samples: 176048 | elapsed time per iteration (ms): 16548.2 | learning rate: 4.869E-05 | global batch size: 64 | lm loss: 6.365273E+00 | loss scale: 2048.0 | grad norm: 74137.806 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5960/ 159576 | consumed samples: 176688 | elapsed time per iteration (ms): 16720.7 | learning rate: 4.887E-05 | global batch size: 64 | lm loss: 6.339179E+00 | loss scale: 2048.0 | grad norm: 117906.851 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5970/ 159576 | consumed samples: 177328 | elapsed time per iteration (ms): 16666.6 | learning rate: 4.905E-05 | global batch size: 64 | lm loss: 6.366007E+00 | loss scale: 2048.0 | grad norm: 135736.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5980/ 159576 | consumed samples: 177968 | elapsed time per iteration (ms): 16712.0 | learning rate: 4.922E-05 | global batch size: 64 | lm loss: 6.311417E+00 | loss scale: 2048.0 | grad norm: 59672.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 5990/ 159576 | consumed samples: 178608 | elapsed time per iteration (ms): 16795.9 | learning rate: 4.940E-05 | global batch size: 64 | lm loss: 6.346366E+00 | loss scale: 2048.0 | grad norm: 70394.026 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 03:26:24,359] [INFO] [logging.py:68:log_dist] [Rank 0] step=6000, skipped=13, lr=[4.9579881656804734e-05, 4.9579881656804734e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 6000 loss: 6.4051 iter time (s): 0.008 samples/sec: 7888.018 iteration 6000/ 159576 | consumed samples: 179248 | elapsed time per iteration (ms): 16825.1 | learning rate: 4.958E-05 | global batch size: 64 | lm loss: 6.338142E+00 | loss scale: 2048.0 | grad norm: 51469.855 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------ validation loss at iteration 6000 | lm loss value: 6.305492E+00 | lm loss PPL: 5.475711E+02 | ------------------------------------------------------------------------------------------------ saving checkpoint at iteration 6000 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-25 03:26:46,630] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step6000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 6000 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 18535.85 iteration 6010/ 159576 | consumed samples: 179888 | elapsed time per iteration (ms): 19605.0 | learning rate: 4.976E-05 | global batch size: 64 | lm loss: 6.332598E+00 | loss scale: 2048.0 | grad norm: 64216.775 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6020/ 159576 | consumed samples: 180528 | elapsed time per iteration (ms): 16682.2 | learning rate: 4.993E-05 | global batch size: 64 | lm loss: 6.346989E+00 | loss scale: 2048.0 | grad norm: 65052.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6030/ 159576 | consumed samples: 181168 | elapsed time per iteration (ms): 16536.1 | learning rate: 5.011E-05 | global batch size: 64 | lm loss: 6.314711E+00 | loss scale: 2048.0 | grad norm: 61186.621 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6040/ 159576 | consumed samples: 181808 | elapsed time per iteration (ms): 16509.4 | learning rate: 5.029E-05 | global batch size: 64 | lm loss: 6.347876E+00 | loss scale: 2048.0 | grad norm: 80684.961 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6050/ 159576 | consumed samples: 182448 | elapsed time per iteration (ms): 16821.6 | learning rate: 5.047E-05 | global batch size: 64 | lm loss: 6.345741E+00 | loss scale: 2048.0 | grad norm: 207970.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6060/ 159576 | consumed samples: 183088 | elapsed time per iteration (ms): 16815.3 | learning rate: 5.064E-05 | global batch size: 64 | lm loss: 6.341463E+00 | loss scale: 2048.0 | grad norm: 57913.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6070/ 159576 | consumed samples: 183728 | elapsed time per iteration (ms): 16825.8 | learning rate: 5.082E-05 | global batch size: 64 | lm loss: 6.336625E+00 | loss scale: 2048.0 | grad norm: 62496.040 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6080/ 159576 | consumed samples: 184368 | elapsed time per iteration (ms): 16749.3 | learning rate: 5.100E-05 | global batch size: 64 | lm loss: 6.378619E+00 | loss scale: 2048.0 | grad norm: 53421.784 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6090/ 159576 | consumed samples: 185008 | elapsed time per iteration (ms): 16844.2 | learning rate: 5.118E-05 | global batch size: 64 | lm loss: 6.363810E+00 | loss scale: 2048.0 | grad norm: 53621.070 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6100/ 159576 | consumed samples: 185648 | elapsed time per iteration (ms): 16803.1 | learning rate: 5.136E-05 | global batch size: 64 | lm loss: 6.397610E+00 | loss scale: 2048.0 | grad norm: 63234.859 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6110/ 159576 | consumed samples: 186288 | elapsed time per iteration (ms): 16808.5 | learning rate: 5.153E-05 | global batch size: 64 | lm loss: 6.359557E+00 | loss scale: 2048.0 | grad norm: 52582.093 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6120/ 159576 | consumed samples: 186928 | elapsed time per iteration (ms): 16792.9 | learning rate: 5.171E-05 | global batch size: 64 | lm loss: 6.347573E+00 | loss scale: 2048.0 | grad norm: 50959.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6130/ 159576 | consumed samples: 187568 | elapsed time per iteration (ms): 16806.7 | learning rate: 5.189E-05 | global batch size: 64 | lm loss: 6.351057E+00 | loss scale: 2048.0 | grad norm: 152670.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6140/ 159576 | consumed samples: 188208 | elapsed time per iteration (ms): 16808.0 | learning rate: 5.207E-05 | global batch size: 64 | lm loss: 6.374673E+00 | loss scale: 2048.0 | grad norm: 50742.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 04:08:28] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1185639_[1-10%1] on 'gpu_p13' partition) [2021-09-25 04:08:28] PULSE: tr8-104B is running for 1:33:00 since 2021-09-25T02:35:28 (1185609 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n[0,7-8]) iteration 6150/ 159576 | consumed samples: 188848 | elapsed time per iteration (ms): 16696.6 | learning rate: 5.224E-05 | global batch size: 64 | lm loss: 6.323299E+00 | loss scale: 2048.0 | grad norm: 55101.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6160/ 159576 | consumed samples: 189600 | elapsed time per iteration (ms): 17385.3 | learning rate: 5.245E-05 | global batch size: 80 | lm loss: 6.368839E+00 | loss scale: 2048.0 | grad norm: 51296.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6170/ 159576 | consumed samples: 190400 | elapsed time per iteration (ms): 17823.6 | learning rate: 5.267E-05 | global batch size: 80 | lm loss: 6.355129E+00 | loss scale: 2048.0 | grad norm: 85490.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6180/ 159576 | consumed samples: 191200 | elapsed time per iteration (ms): 17757.4 | learning rate: 5.289E-05 | global batch size: 80 | lm loss: 6.373211E+00 | loss scale: 2048.0 | grad norm: 112584.865 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6190/ 159576 | consumed samples: 192000 | elapsed time per iteration (ms): 17583.1 | learning rate: 5.312E-05 | global batch size: 80 | lm loss: 6.372861E+00 | loss scale: 2048.0 | grad norm: 102723.952 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6200/ 159576 | consumed samples: 192800 | elapsed time per iteration (ms): 17380.3 | learning rate: 5.334E-05 | global batch size: 80 | lm loss: 6.336594E+00 | loss scale: 2048.0 | grad norm: 41950.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6210/ 159576 | consumed samples: 193600 | elapsed time per iteration (ms): 17443.3 | learning rate: 5.356E-05 | global batch size: 80 | lm loss: 6.355303E+00 | loss scale: 4096.0 | grad norm: 131493.175 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 6210 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-25 04:26:31,836] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step6210/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 6210 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 19730.09 [exiting program after 110.16574249267578 minutes] datetime: 2021-09-25 04:26:46 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed-------------------------------------------------- .. compatibleDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] --------------------------------------------------fused_adam ............. DeepSpeed C++/CUDA extension op report[NO] .......-------------------------------------------------- [OKAY]NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninjafused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adamninja ............. ..................[NO] .......[OKAY] [OKAY] -------------------------------------------------- op name fused_lamb................ .............installed [NO].. .......compatible [OKAY] -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] sparse_attn fused_adam ............. [NO] ................... [NO][OKAY] ....... [OKAY]fused_lamb ............. [NO] .......transformer [OKAY]............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO]sparse_attn ................... [OKAY][NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY]quantizer .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- async_io ............... [NO] ....... [NO] cpu_adam ............... [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- transformer ............ [NO] ....... [OKAY] JIT compiled ops requires ninja stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] op name ................ installed .. compatible stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. op name ................ installed .. compatible -------------------------------------------------- async_io ............... [NO] ....... [NO] cpu_adam ............... [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . --------------------------------------------------[NO] ....... -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ninja .................. [OKAY] torch version .................... 1.8.1 -------------------------------------------------- torch cuda version ............... 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. op name ................ installed .. compatible -------------------------------------------------- nvcc version ..................... 11.2 async_io ............... [NO] ....... [NO] cpu_adam ............... [YES] ...... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] fused_adam ............. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference .. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] ninja .................. [OKAY] utils .................. [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] DeepSpeed general environment info: -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] cpu_adam ............... [YES] ...... [OKAY] torch version .................... 1.8.1 fused_adam ............. [NO] ....... [OKAY] torch cuda version ............... 11.1 fused_lamb ............. [NO] ....... [OKAY] nvcc version ..................... 11.2 sparse_attn ............ [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] transformer ............ [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science stochastic_transformer . [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. quantizer .............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed C++/CUDA extension op report -------------------------------------------------- utils .................. [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- ninja .................. [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ninja............... [YES].................. [OKAY]...... [OKAY]-------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] op name ................ installed .. compatible fused_adam --------------------------------------------------............. fused_adam ............. [NO] ....... [OKAY] [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_lamb .............cpu_adam [NO]............... .......[YES] ......[OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_adamsparse_attn ......................... [NO] [NO]....... .......[OKAY] [OKAY] fused_lamb transformer............. ............[NO] .......[NO] [OKAY]....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- transformer_inferencetransformer_inference .... [NO] ....... [OKAY] [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY] JIT compiled ops requires ninja ..................quantizer [YES].............. ......[NO] [OKAY]....... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version ..................... .....................11.2 11.2 deepspeed install pathdeepspeed install path ........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] ninja .................. [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam-------------------------------------------------- ............... [YES] DeepSpeed C++/CUDA extension op report...... [OKAY]-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lambninja ............. ..................[NO] [OKAY]....... [OKAY]-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. op name ................ installed .. compatible -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- sparse_attn ............ [NO] .......cpu_adam [OKAY]............... [YES] ......transformer [OKAY]............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- stochastic_transformerfused_adam .............. [NO][NO] ....... .......[OKAY] JIT compiled ops requires ninja [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] op name ................ installed .. compatible -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system transformer ............ [NO] ....... [OKAY] meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] ninja .................. [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] DeepSpeed general environment info: DeepSpeed general environment info:torch install path sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] ............... torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ...............torch version 11.1.................... stochastic_transformer . [NO] ....... [OKAY] 1.8.1nvcc version ..................... torch cuda version11.2 ...............deepspeed install path 11.1........... nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']..................... deepspeed info11.2 ...................deepspeed install path 0.4.2+bc17042, bc17042, big-science........... deepspeed wheel compiled w. ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...... deepspeed infotorch 1.8, cuda 11.1 ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer quantizer.............. [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 ninja .................. [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install pathtorch version ................................... 1.8.1 torch cuda version ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']11.1 nvcc version torch version..................... ....................11.2 1.8.1deepspeed install path ........... torch cuda version ...............['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 11.1deepspeed info nvcc version................... .....................0.4.2+bc17042, bc17042, big-science 11.2 deepspeed wheel compiled w. deepspeed install path...... ...........torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op reportstochastic_transformer -------------------------------------------------- . NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.[NO] --------------------------------------------------....... JIT compiled ops requires ninja[OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- async_io ............... [NO] ....... [NO] stochastic_transformer . [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] utils .................. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [YES] ...... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- async_io ............... [NO] ....... [NO] ninja .................. [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformer_inference .. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- utils .................. [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ninja .................. [OKAY] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- op name ................ installed .. compatible NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- JIT compiled ops requires ninja fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 async_io ............... [NO] ....... [NO] torch cuda version ............... 11.1 nvcc version ..................... 11.2 transformer_inference .. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] utils .................. [YES] ...... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- JIT compiled ops requires ninja cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ninja .................. [OKAY] async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... [NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] ....... utils[OKAY] .................. [YES] ...... utils[OKAY] .................. [YES]quantizer .................... [NO] ....... [OKAY][OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] ....... .......[NO] [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------sparse_attn ............ -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja [NO]DeepSpeed C++/CUDA extension op report ....... --------------------------------------------------[OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. transformer --------------------------------------------------............ JIT compiled ops requires ninja[NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer ............ [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] stochastic_transformer . [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science ...........deepspeed wheel compiled w. ......['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] DeepSpeed general environment info: sparse_attn ............ [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] async_io ............... [NO] ....... [NO] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ...................DeepSpeed general environment info: 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] ninja .................. [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- utils .................. [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] DeepSpeed general environment info: sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] stochastic_transformer . [NO] ....... [OKAY] torch version .................... 1.8.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] async_io ............... [NO] ....... [NO] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... .................... 1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1 nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path ...........deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info ...................deepspeed info ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... DeepSpeed general environment info:11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infotorch install path ................... ...............0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed general environment info: DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] async_io ............... [NO]async_io ....... [NO]............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... transformer_inference[OKAY] .. [NO] ....... [OKAY]utils -------------------------------------------------- .................. [YES] ...... [OKAY]utils .................. [YES]quantizer .................... [OKAY][NO] ....... quantizer[OKAY] .............. [NO] --------------------------------------------------....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.-------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO]............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] utils .................. [YES] ...... [OKAY] ---------------------------------------------------------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_ioasync_io .............................. [NO][NO] .............. [NO][NO] torch version .................... 1.8.1 transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] torch cuda version ............... 11.1 utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] nvcc version ..................... 11.2 quantizer ..............quantizer [NO].............. ....... [NO][OKAY] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... async_io[NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inferenceutils .................... [YES] ...... [OKAY] [NO] ....... [OKAY]quantizer .............. [NO] ....... [OKAY] utils .................. [YES]-------------------------------------------------- ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] /bin/sh: line 0: type: git: not found transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] DeepSpeed general environment info: fused_lamb ............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 sparse_attn ............ [NO] ....... [OKAY] torch cuda version ............... 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- transformer ............ [NO] ....... [OKAY] nvcc version ..................... 11.2 JIT compiled ops requires ninja stochastic_transformer . [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 DeepSpeed general environment info: nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- async_io ............... [NO] ....... [NO] torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 1.8.1 torch version torch cuda version.................... ...............1.8.1 11.1 op name ................ installed .. compatible -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] torch cuda versionnvcc version .................................... 11.111.2 cpu_adam ............... [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install path deepspeed info........... ................... 0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] fused_adam ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- torch cuda version ............... 11.1 JIT compiled ops requires ninja nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: ninja .................. [OKAY] torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 -------------------------------------------------- torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] op name ................ installed .. compatible -------------------------------------------------- deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_lamb ............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] sparse_attn ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- utils .................. [YES] ...... [OKAY] JIT compiled ops requires ninja quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- utils .................. [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ninja .................. [OKAY] torch version .................... 1.8.1 -------------------------------------------------- torch cuda version ............... 11.1 op name ................ installed .. compatible -------------------------------------------------- nvcc version ..................... 11.2 cpu_adam ............... [YES] ...... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] fused_adam ............. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. stochastic_transformer . [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 -------------------------------------------------- torch cuda version ............... 11.1 nvcc version ..................... 11.2 DeepSpeed C++/CUDA extension op report deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. stochastic_transformer . [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] ninja .................. [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- utils .................. [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report transformer ............ [NO] ....... [OKAY] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] op name ................ installed .. compatible DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_io ............... [NO] ....... [NO] fused_adam ............. [NO] ....... [OKAY] torch version .................... 1.8.1 fused_lamb ............. [NO] ....... [OKAY] torch cuda version ............... 11.1 transformer_inference .. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] nvcc version ..................... 11.2 utils .................. [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_ioasync_io .............................. [NO][NO] .............. [NO][NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_io ............... [NO] ....... [NO] torch version .................... 1.8.1 transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] torch cuda version ............... 11.1 transformer_inference .. [NO] ....... [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] nvcc version ..................... 11.2 utils .................. [YES] ...... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] quantizer .............. [NO] ....... [OKAY] [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ninja .................. [OKAY] async_io ............... [NO] ....... [NO] -------------------------------------------------- op name ................ installed .. compatible transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info: utils .................. [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 quantizer .............. [NO] ....... [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ninja .................. [OKAY] -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] DeepSpeed general environment info: fused_adam ............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 transformer ............ [NO] ....... [OKAY] nvcc version ..................... 11.2 stochastic_transformer . [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] JIT compiled ops requires ninja utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_io ............... [NO] ....... [NO] torch version .................... 1.8.1 async_io ............... [NO] ....... [NO] torch cuda version ............... 11.1 transformer_inference .. [NO] ....... [OKAY] nvcc version ..................... 11.2 transformer_inference .. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] utils .................. [YES] ...... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed general environment info: -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science op name ................ installed .. compatible -------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 async_io ............... [NO] ....... [NO] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 JIT compiled ops requires ninjaninja .................. [OKAY] -------------------------------------------------- torch cuda version ............... 11.1 op name ................ installed .. compatible -------------------------------------------------- nvcc version ..................... 11.2  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. cpu_adam ............... [YES] ...... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] async_io ............... [NO] ....... [NO] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference .. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY]ninja .................. [OKAY] -------------------------------------------------- op name ................ installedfused_adam ............... compatible[NO] .......-------------------------------------------------- [OKAY] fused_lamb ............. [NO]cpu_adam ...................... [OKAY][YES] ...... [OKAY] sparse_attn fused_adam............ .............[NO] [NO]....... .......[OKAY] [OKAY] transformer ............fused_lamb [NO]............. .......[NO] [OKAY]....... ninja .................. [OKAY] [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- transformer ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: ninja .................. [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science op name ................ installed .. compatible -------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info: utils .................. [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 quantizer .............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 torch cuda version ............... 11.1 -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science nvcc version ..................... 11.2 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science DeepSpeed general environment info: deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch cuda version ............... 11.1 async_io ............... [NO] ....... [NO] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 utils .................. [YES] ...... [OKAY] torch cuda version ............... 11.1 quantizer .............. [NO] ....... [OKAY] nvcc versionDeepSpeed general environment info: ..................... 11.2 -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- deepspeed install path ........... torch install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info .................................. 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] JIT compiled ops requires ninja torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch cuda version ............... 11.1 nvcc version ..................... 11.2 async_io ............... [NO] ....... [NO] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science transformer_inference .. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .. ..[NO] [NO]....... .......[OKAY] [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer ..............quantizer [NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 ninja .................. [OKAY] nvcc version ..................... 11.2 -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found cpu_adam ............... [YES] ...... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] DeepSpeed general environment info: sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] stochastic_transformer . [NO] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja ninja.................. [OKAY].................. --------------------------------------------------[OKAY] op name --------------------------------------------------................ op nameinstalled .................. compatibleinstalled -------------------------------------------------- .. compatible -------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... [OKAY]............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adam .............fused_lamb .............[NO] [NO]....... ....... [OKAY][OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformersparse_attn ........................ [NO] [NO]....... .......[OKAY] [OKAY] stochastic_transformertransformer ............. [NO][NO] ....... .......[OKAY] [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] DeepSpeed general environment info:DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 utils .................. [YES] ...... [OKAY] torch cuda version torch cuda version............... ...............11.1 11.1 nvcc versionnvcc version .......................................... 11.211.2 quantizer .............. [NO] ....... [OKAY] deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path torch version............... .................... 1.8.1 torch cuda version ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 11.1 nvcc versiontorch version ......................................... 11.21.8.1 deepspeed install path torch cuda version........... ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.1 deepspeed infonvcc version ........................................ 11.20.4.2+bc17042, bc17042, big-science deepspeed install path deepspeed wheel compiled w............ ...... torch 1.8, cuda 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY]quantizer .............. [NO] .......quantizer [OKAY].............. [NO] .......-------------------------------------------------- [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info: deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. op name ................ installed .. compatible -------------------------------------------------- async_io ............... [NO] ....... [NO] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] transformer_inference [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. .. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] async_ioquantizer ............................. [NO][NO] .............. [NO][OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] DeepSpeed general environment info: quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] utils .................. [YES] ...... [OKAY] torch version .................... 1.8.1 quantizer .............. [NO] ....... [OKAY] DeepSpeed general environment info: -------------------------------------------------- torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info: utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] /bin/sh: line 0: type: git: not found transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found DeepSpeed general environment info: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 /bin/sh: line 0: type: git: not found nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 DeepSpeed general environment info:deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ...................torch install path 0.4.2+bc17042, bc17042, big-science ...............deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install pathtorch version .................... ...............1.8.1 torch cuda version ............... 11.1 nvcc version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']..................... 11.2 JIT compiled ops requires ninja deepspeed install path torch version........... ....................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 1.8.1deepspeed info ................... 0.4.2+bc17042, bc17042, big-sciencetorch cuda version deepspeed wheel compiled w................ ...... 11.1torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info:torch install path ............... torch install path ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1 torch cuda versiontorch version ................................... 11.11.8.1 nvcc version torch cuda version..................... ...............11.2 11.1deepspeed install path nvcc version........... ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.2 deepspeed install pathdeepspeed info .............................. 0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] .......utils [OKAY].................. [YES] ...... [OKAY] utils ..................quantizer [YES].............. ......[NO] [OKAY]....... [OKAY] quantizer --------------------------------------------------.............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] DeepSpeed general environment info: fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] torch install path ............... DeepSpeed general environment info: sparse_attn ............ [NO] ....... [OKAY] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found torch install pathtorch version ................................... 1.8.1 torch cuda version ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 11.1 transformer ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found torch versionnvcc version ......................................... 1.8.111.2 stochastic_transformer . [NO] ....... [OKAY] deepspeed install path torch cuda version........... ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.1 deepspeed infonvcc version ........................................ 0.4.2+bc17042, bc17042, big-science11.2 deepspeed wheel compiled w.deepspeed install path ................. torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO] ...................... [NO][NO] ....... [NO] /bin/sh: line 0: type: git: not found transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] ......quantizer [OKAY].............. [NO] ....... [OKAY]quantizer .............. [NO] --------------------------------------------------....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. /bin/sh: line 0: type: git: not found -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ninja .................. [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found DeepSpeed general environment info: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils .................. [YES] utils...... [OKAY] .................. [YES]quantizer .................... [OKAY][NO] ....... [OKAY] quantizer-------------------------------------------------- .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version .................... 1.8.1torch version .................... torch cuda version1.8.1 ............... 11.1torch cuda version nvcc version............... .....................11.1 DeepSpeed general environment info: 11.2nvcc version deepspeed install path..................... ...........11.2 deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed info ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version .................... torch version1.8.1 .................... torch cuda version1.8.1 ...................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... torch 1.8, cuda 11.1deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ............... torch cuda version11.1 ...............nvcc version 11.1..................... nvcc version11.2 .....................deepspeed install path 11.2........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']................... deepspeed info0.4.2+bc17042, bc17042, big-science ...................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science DeepSpeed general environment info: deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 nvcc version ..................... 11.2  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 utils .................. [YES] ...... [OKAY] torch cuda version ............... 11.1 quantizer .............. [NO] ....... [OKAY] nvcc version ..................... 11.2 -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] JIT compiled ops requires ninja transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 utils .................. [YES] ...... [OKAY] torch cuda version ............... 11.1 quantizer .............. [NO] ....... [OKAY] nvcc version ..................... 11.2 -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. stochastic_transformer . [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] ninja .................. [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- transformer ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] JIT compiled ops requires ninja torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 torch cuda version ............... 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja --------------------------------------------------.................. [OKAY] DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------op name NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op................. --------------------------------------------------installed JIT compiled ops requires ninja.. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 DeepSpeed general environment info:DeepSpeed general environment info:torch cuda version ............... 11.1 nvcc version torch install path.....................torch install path 11.2............... deepspeed install path............... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed info ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']................... torch version 0.4.2+bc17042, bc17042, big-sciencetorch version.................... deepspeed wheel compiled w.....................1.8.1 ......1.8.1 torch cuda versiontorch 1.8, cuda 11.1 torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... DeepSpeed general environment info:0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1 torch version ....................torch cuda version ...............1.8.1 11.1 torch cuda versionnvcc version .................................... 11.211.1 deepspeed install pathnvcc version ................................ 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install pathdeepspeed info .............................. ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io ............... [NO] ....... [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference .. [NO] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version ............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 ninja .................. [OKAY] torch cuda version ............... 11.1 -------------------------------------------------- nvcc version ..................... 11.2 op name ................ installed .. compatible -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] cpu_adam ............... [YES] ...... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science fused_adam ............. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 DeepSpeed general environment info: 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version .....................nvcc version 11.2..................... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 async_io ............... [NO] ....... [NO] nvcc versionnvcc version .......................................... 11.211.2 transformer_inference .. [NO] ....... [OKAY] deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science utils .................. [YES] ...... [OKAY] 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] async_io-------------------------------------------------- ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. /bin/sh: line 0: type: git: not found async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils utils.................. [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. [NO] quantizer....... ..............[OKAY] [NO] .......-------------------------------------------------- [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO]ninja ....... ..................[OKAY] [OKAY] transformer ............ [NO] --------------------------------------------------....... [OKAY]op name ................ installedstochastic_transformer .. .compatible [NO]-------------------------------------------------- ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science DeepSpeed general environment info: deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install pathtorch version ................................... 1.8.1 torch cuda version ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']11.1 nvcc versiontorch version ......................................... 11.21.8.1 deepspeed install path torch cuda version........... ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.1 deepspeed infonvcc version ........................................ 0.4.2+bc17042, bc17042, big-science11.2 deepspeed wheel compiled w.deepspeed install path ................. torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'].................... 1.8.1 torch version ....................torch cuda version 1.8.1............... 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path ...........deepspeed info ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_ioutils ................................. [NO][YES] ............. [NO][OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference-------------------------------------------------- .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 1.8.1 torch versiontorch cuda version ................................... 1.8.111.1 nvcc versiontorch cuda version .................................... 11.211.1 deepspeed install pathnvcc version ................................ 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install path deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... torch 1.8, cuda 11.1 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO]utils ......................... [OKAY][YES] ...... [OKAY] utils quantizer.................. ..............[YES] [NO]...... ....... [OKAY] [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.DeepSpeed general environment info: ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** using world size: 256, data-parallel-size: 8, tensor-model-parallel size: 4, pipeline-model-parallel size: 8 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/codecarbon consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False data_impl ....................................... mmap data_parallel_size .............................. 8 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1186600.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 5 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 20480 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False global_batch_size ............................... 2048 hidden_dropout .................................. 0.1 hidden_size ..................................... 16384 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 512 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 10 log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ 126953125 lr_decay_style .................................. cosine lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 216320 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 32 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 32 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 8 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['16', '16', '6_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 42 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 300000000 use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 256 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 16 to global batch size 2048 with batch size increments 16 over 6000000 samples. > building GPT2BPETokenizer tokenizer ... DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) > setting tensorboard ... DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................DeepSpeed general environment info: 1.8.1 torch cuda version ............... 11.1 torch install pathnvcc version .................................... 11.2 deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infotorch version ....................................... 0.4.2+bc17042, bc17042, big-science1.8.1 deepspeed wheel compiled w. ......torch cuda version torch 1.8, cuda 11.1............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > setting codecarbon ... > initializing torch distributed ... -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name................ ................ ................ installed................ installed installedinstalled.... ....compatiblecompatible compatiblecompatible---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adamcpu_adamcpu_adam ............................................................ [YES][YES][YES][YES] .................. ...... [OKAY][OKAY][OKAY] [OKAY] fused_adam fused_adam.............fused_adam fused_adam [NO].......................... ............. [NO] [NO][NO]....... ..............[OKAY]....... [OKAY][OKAY] [OKAY]fused_lamb .............fused_lamb fused_lambfused_lamb[NO]............. .................................[NO] [NO][OKAY][NO]....... ....... [OKAY] ....... [OKAY] [OKAY] sparse_attn ............ [NO] .......sparse_attn [OKAY]............ sparse_attn sparse_attn [NO]transformer ............ ............ ............ [NO]....... [NO][NO] .......[OKAY].............. [OKAY][OKAY][OKAY] transformer ............transformertransformer stochastic_transformer[NO]........................ ....... . [NO][NO] [OKAY][NO]....... ..............[OKAY] [OKAY][OKAY]stochastic_transformer .stochastic_transformer stochastic_transformer[NO] ........ . [NO] [OKAY] [NO]....... .......[OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op nameop nameop name op name................................................ ................installedinstalledinstalled installed...... compatible..compatiblecompatible --------------------------------------------------compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adamcpu_adam ............... ............... ...............[YES]cpu_adam [YES] [YES] ...... ............... ............ [OKAY] [YES] [OKAY][OKAY] ...... [OKAY] fused_adam .............fused_adamfused_adam fused_adam[NO]............. ............. .................... [NO] [NO][NO] [OKAY] ....... .............. [OKAY] fused_lamb[OKAY][OKAY] ............. [NO]fused_lambfused_lambfused_lamb .............................................. [OKAY][NO][NO][NO] ..................... [OKAY][OKAY][OKAY] sparse_attn ............ [NO] ....... sparse_attnsparse_attnsparse_attn[OKAY] ........................ ............[NO][NO] [NO]transformer.............. ...................[OKAY] [OKAY][OKAY] [NO] transformertransformer.......transformer ............ ............ [OKAY] ............ [NO][NO] [NO].............. stochastic_transformer....... [OKAY] [OKAY] [OKAY] . [NO]stochastic_transformerstochastic_transformer stochastic_transformer ........ .[OKAY].[NO] [NO][NO] .............. .......[OKAY][OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] async_io...... ...............[OKAY] [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... DeepSpeed general environment info:0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 8 > setting random seeds to 42 ... [2021-09-25 04:27:14,118] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' >>> done with dataset index builder. Compilation time: 0.302 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 23.417 seconds time to initialize megatron (seconds): 25.843 [after megatron is initialized] datetime: 2021-09-25 04:27:37 building GPT model ... [2021-09-25 04:27:37,906] [INFO] [utils.py:680:see_memory_usage] Before Building Model /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved warnings.warn( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved warnings.warn( [2021-09-25 04:27:37,908] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-09-25 04:27:37,908] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 36.69 GB, percent = 19.6% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=0, data=4, model=0): 16, ProcessCoord(pipe=0, data=4, model=1): 17, ProcessCoord(pipe=0, data=4, model=2): 18, ProcessCoord(pipe=0, data=4, model=3): 19, ProcessCoord(pipe=0, data=5, model=0): 20, ProcessCoord(pipe=0, data=5, model=1): 21, ProcessCoord(pipe=0, data=5, model=2): 22, ProcessCoord(pipe=0, data=5, model=3): 23, ProcessCoord(pipe=0, data=6, model=0): 24, ProcessCoord(pipe=0, data=6, model=1): 25, ProcessCoord(pipe=0, data=6, model=2): 26, ProcessCoord(pipe=0, data=6, model=3): 27, ProcessCoord(pipe=0, data=7, model=0): 28, ProcessCoord(pipe=0, data=7, model=1): 29, ProcessCoord(pipe=0, data=7, model=2): 30, ProcessCoord(pipe=0, data=7, model=3): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=0, model=1): 33, ProcessCoord(pipe=1, data=0, model=2): 34, ProcessCoord(pipe=1, data=0, model=3): 35, ProcessCoord(pipe=1, data=1, model=0): 36, ProcessCoord(pipe=1, data=1, model=1): 37, ProcessCoord(pipe=1, data=1, model=2): 38, ProcessCoord(pipe=1, data=1, model=3): 39, ProcessCoord(pipe=1, data=2, model=0): 40, ProcessCoord(pipe=1, data=2, model=1): 41, ProcessCoord(pipe=1, data=2, model=2): 42, ProcessCoord(pipe=1, data=2, model=3): 43, ProcessCoord(pipe=1, data=3, model=0): 44, ProcessCoord(pipe=1, data=3, model=1): 45, ProcessCoord(pipe=1, data=3, model=2): 46, ProcessCoord(pipe=1, data=3, model=3): 47, ProcessCoord(pipe=1, data=4, model=0): 48, ProcessCoord(pipe=1, data=4, model=1): 49, ProcessCoord(pipe=1, data=4, model=2): 50, ProcessCoord(pipe=1, data=4, model=3): 51, ProcessCoord(pipe=1, data=5, model=0): 52, ProcessCoord(pipe=1, data=5, model=1): 53, ProcessCoord(pipe=1, data=5, model=2): 54, ProcessCoord(pipe=1, data=5, model=3): 55, ProcessCoord(pipe=1, data=6, model=0): 56, ProcessCoord(pipe=1, data=6, model=1): 57, ProcessCoord(pipe=1, data=6, model=2): 58, ProcessCoord(pipe=1, data=6, model=3): 59, ProcessCoord(pipe=1, data=7, model=0): 60, ProcessCoord(pipe=1, data=7, model=1): 61, ProcessCoord(pipe=1, data=7, model=2): 62, ProcessCoord(pipe=1, data=7, model=3): 63, ProcessCoord(pipe=2, data=0, model=0): 64, ProcessCoord(pipe=2, data=0, model=1): 65, ProcessCoord(pipe=2, data=0, model=2): 66, ProcessCoord(pipe=2, data=0, model=3): 67, ProcessCoord(pipe=2, data=1, model=0): 68, ProcessCoord(pipe=2, data=1, model=1): 69, ProcessCoord(pipe=2, data=1, model=2): 70, ProcessCoord(pipe=2, data=1, model=3): 71, ProcessCoord(pipe=2, data=2, model=0): 72, ProcessCoord(pipe=2, data=2, model=1): 73, ProcessCoord(pipe=2, data=2, model=2): 74, ProcessCoord(pipe=2, data=2, model=3): 75, ProcessCoord(pipe=2, data=3, model=0): 76, ProcessCoord(pipe=2, data=3, model=1): 77, ProcessCoord(pipe=2, data=3, model=2): 78, ProcessCoord(pipe=2, data=3, model=3): 79, ProcessCoord(pipe=2, data=4, model=0): 80, ProcessCoord(pipe=2, data=4, model=1): 81, ProcessCoord(pipe=2, data=4, model=2): 82, ProcessCoord(pipe=2, data=4, model=3): 83, ProcessCoord(pipe=2, data=5, model=0): 84, ProcessCoord(pipe=2, data=5, model=1): 85, ProcessCoord(pipe=2, data=5, model=2): 86, ProcessCoord(pipe=2, data=5, model=3): 87, ProcessCoord(pipe=2, data=6, model=0): 88, ProcessCoord(pipe=2, data=6, model=1): 89, ProcessCoord(pipe=2, data=6, model=2): 90, ProcessCoord(pipe=2, data=6, model=3): 91, ProcessCoord(pipe=2, data=7, model=0): 92, ProcessCoord(pipe=2, data=7, model=1): 93, ProcessCoord(pipe=2, data=7, model=2): 94, ProcessCoord(pipe=2, data=7, model=3): 95, ProcessCoord(pipe=3, data=0, model=0): 96, ProcessCoord(pipe=3, data=0, model=1): 97, ProcessCoord(pipe=3, data=0, model=2): 98, ProcessCoord(pipe=3, data=0, model=3): 99, ProcessCoord(pipe=3, data=1, model=0): 100, ProcessCoord(pipe=3, data=1, model=1): 101, ProcessCoord(pipe=3, data=1, model=2): 102, ProcessCoord(pipe=3, data=1, model=3): 103, ProcessCoord(pipe=3, data=2, model=0): 104, ProcessCoord(pipe=3, data=2, model=1): 105, ProcessCoord(pipe=3, data=2, model=2): 106, ProcessCoord(pipe=3, data=2, model=3): 107, ProcessCoord(pipe=3, data=3, model=0): 108, ProcessCoord(pipe=3, data=3, model=1): 109, ProcessCoord(pipe=3, data=3, model=2): 110, ProcessCoord(pipe=3, data=3, model=3): 111, ProcessCoord(pipe=3, data=4, model=0): 112, ProcessCoord(pipe=3, data=4, model=1): 113, ProcessCoord(pipe=3, data=4, model=2): 114, ProcessCoord(pipe=3, data=4, model=3): 115, ProcessCoord(pipe=3, data=5, model=0): 116, ProcessCoord(pipe=3, data=5, model=1): 117, ProcessCoord(pipe=3, data=5, model=2): 118, ProcessCoord(pipe=3, data=5, model=3): 119, ProcessCoord(pipe=3, data=6, model=0): 120, ProcessCoord(pipe=3, data=6, model=1): 121, ProcessCoord(pipe=3, data=6, model=2): 122, ProcessCoord(pipe=3, data=6, model=3): 123, ProcessCoord(pipe=3, data=7, model=0): 124, ProcessCoord(pipe=3, data=7, model=1): 125, ProcessCoord(pipe=3, data=7, model=2): 126, ProcessCoord(pipe=3, data=7, model=3): 127, ProcessCoord(pipe=4, data=0, model=0): 128, ProcessCoord(pipe=4, data=0, model=1): 129, ProcessCoord(pipe=4, data=0, model=2): 130, ProcessCoord(pipe=4, data=0, model=3): 131, ProcessCoord(pipe=4, data=1, model=0): 132, ProcessCoord(pipe=4, data=1, model=1): 133, ProcessCoord(pipe=4, data=1, model=2): 134, ProcessCoord(pipe=4, data=1, model=3): 135, ProcessCoord(pipe=4, data=2, model=0): 136, ProcessCoord(pipe=4, data=2, model=1): 137, ProcessCoord(pipe=4, data=2, model=2): 138, ProcessCoord(pipe=4, data=2, model=3): 139, ProcessCoord(pipe=4, data=3, model=0): 140, ProcessCoord(pipe=4, data=3, model=1): 141, ProcessCoord(pipe=4, data=3, model=2): 142, ProcessCoord(pipe=4, data=3, model=3): 143, ProcessCoord(pipe=4, data=4, model=0): 144, ProcessCoord(pipe=4, data=4, model=1): 145, ProcessCoord(pipe=4, data=4, model=2): 146, ProcessCoord(pipe=4, data=4, model=3): 147, ProcessCoord(pipe=4, data=5, model=0): 148, ProcessCoord(pipe=4, data=5, model=1): 149, ProcessCoord(pipe=4, data=5, model=2): 150, ProcessCoord(pipe=4, data=5, model=3): 151, ProcessCoord(pipe=4, data=6, model=0): 152, ProcessCoord(pipe=4, data=6, model=1): 153, ProcessCoord(pipe=4, data=6, model=2): 154, ProcessCoord(pipe=4, data=6, model=3): 155, ProcessCoord(pipe=4, data=7, model=0): 156, ProcessCoord(pipe=4, data=7, model=1): 157, ProcessCoord(pipe=4, data=7, model=2): 158, ProcessCoord(pipe=4, data=7, model=3): 159, ProcessCoord(pipe=5, data=0, model=0): 160, ProcessCoord(pipe=5, data=0, model=1): 161, ProcessCoord(pipe=5, data=0, model=2): 162, ProcessCoord(pipe=5, data=0, model=3): 163, ProcessCoord(pipe=5, data=1, model=0): 164, ProcessCoord(pipe=5, data=1, model=1): 165, ProcessCoord(pipe=5, data=1, model=2): 166, ProcessCoord(pipe=5, data=1, model=3): 167, ProcessCoord(pipe=5, data=2, model=0): 168, ProcessCoord(pipe=5, data=2, model=1): 169, ProcessCoord(pipe=5, data=2, model=2): 170, ProcessCoord(pipe=5, data=2, model=3): 171, ProcessCoord(pipe=5, data=3, model=0): 172, ProcessCoord(pipe=5, data=3, model=1): 173, ProcessCoord(pipe=5, data=3, model=2): 174, ProcessCoord(pipe=5, data=3, model=3): 175, ProcessCoord(pipe=5, data=4, model=0): 176, ProcessCoord(pipe=5, data=4, model=1): 177, ProcessCoord(pipe=5, data=4, model=2): 178, ProcessCoord(pipe=5, data=4, model=3): 179, ProcessCoord(pipe=5, data=5, model=0): 180, ProcessCoord(pipe=5, data=5, model=1): 181, ProcessCoord(pipe=5, data=5, model=2): 182, ProcessCoord(pipe=5, data=5, model=3): 183, ProcessCoord(pipe=5, data=6, model=0): 184, ProcessCoord(pipe=5, data=6, model=1): 185, ProcessCoord(pipe=5, data=6, model=2): 186, ProcessCoord(pipe=5, data=6, model=3): 187, ProcessCoord(pipe=5, data=7, model=0): 188, ProcessCoord(pipe=5, data=7, model=1): 189, ProcessCoord(pipe=5, data=7, model=2): 190, ProcessCoord(pipe=5, data=7, model=3): 191, ProcessCoord(pipe=6, data=0, model=0): 192, ProcessCoord(pipe=6, data=0, model=1): 193, ProcessCoord(pipe=6, data=0, model=2): 194, ProcessCoord(pipe=6, data=0, model=3): 195, ProcessCoord(pipe=6, data=1, model=0): 196, ProcessCoord(pipe=6, data=1, model=1): 197, ProcessCoord(pipe=6, data=1, model=2): 198, ProcessCoord(pipe=6, data=1, model=3): 199, ProcessCoord(pipe=6, data=2, model=0): 200, ProcessCoord(pipe=6, data=2, model=1): 201, ProcessCoord(pipe=6, data=2, model=2): 202, ProcessCoord(pipe=6, data=2, model=3): 203, ProcessCoord(pipe=6, data=3, model=0): 204, ProcessCoord(pipe=6, data=3, model=1): 205, ProcessCoord(pipe=6, data=3, model=2): 206, ProcessCoord(pipe=6, data=3, model=3): 207, ProcessCoord(pipe=6, data=4, model=0): 208, ProcessCoord(pipe=6, data=4, model=1): 209, ProcessCoord(pipe=6, data=4, model=2): 210, ProcessCoord(pipe=6, data=4, model=3): 211, ProcessCoord(pipe=6, data=5, model=0): 212, ProcessCoord(pipe=6, data=5, model=1): 213, ProcessCoord(pipe=6, data=5, model=2): 214, ProcessCoord(pipe=6, data=5, model=3): 215, ProcessCoord(pipe=6, data=6, model=0): 216, ProcessCoord(pipe=6, data=6, model=1): 217, ProcessCoord(pipe=6, data=6, model=2): 218, ProcessCoord(pipe=6, data=6, model=3): 219, ProcessCoord(pipe=6, data=7, model=0): 220, ProcessCoord(pipe=6, data=7, model=1): 221, ProcessCoord(pipe=6, data=7, model=2): 222, ProcessCoord(pipe=6, data=7, model=3): 223, ProcessCoord(pipe=7, data=0, model=0): 224, ProcessCoord(pipe=7, data=0, model=1): 225, ProcessCoord(pipe=7, data=0, model=2): 226, ProcessCoord(pipe=7, data=0, model=3): 227, ProcessCoord(pipe=7, data=1, model=0): 228, ProcessCoord(pipe=7, data=1, model=1): 229, ProcessCoord(pipe=7, data=1, model=2): 230, ProcessCoord(pipe=7, data=1, model=3): 231, ProcessCoord(pipe=7, data=2, model=0): 232, ProcessCoord(pipe=7, data=2, model=1): 233, ProcessCoord(pipe=7, data=2, model=2): 234, ProcessCoord(pipe=7, data=2, model=3): 235, ProcessCoord(pipe=7, data=3, model=0): 236, ProcessCoord(pipe=7, data=3, model=1): 237, ProcessCoord(pipe=7, data=3, model=2): 238, ProcessCoord(pipe=7, data=3, model=3): 239, ProcessCoord(pipe=7, data=4, model=0): 240, ProcessCoord(pipe=7, data=4, model=1): 241, ProcessCoord(pipe=7, data=4, model=2): 242, ProcessCoord(pipe=7, data=4, model=3): 243, ProcessCoord(pipe=7, data=5, model=0): 244, ProcessCoord(pipe=7, data=5, model=1): 245, ProcessCoord(pipe=7, data=5, model=2): 246, ProcessCoord(pipe=7, data=5, model=3): 247, ProcessCoord(pipe=7, data=6, model=0): 248, ProcessCoord(pipe=7, data=6, model=1): 249, ProcessCoord(pipe=7, data=6, model=2): 250, ProcessCoord(pipe=7, data=6, model=3): 251, ProcessCoord(pipe=7, data=7, model=0): 252, ProcessCoord(pipe=7, data=7, model=1): 253, ProcessCoord(pipe=7, data=7, model=2): 254, ProcessCoord(pipe=7, data=7, model=3): 255} [2021-09-25 04:27:39,312] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=7 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe stage=1 layers=4 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe stage=2 layers=4 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=3 layers=4 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe stage=4 layers=4 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe stage=5 layers=4 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe stage=6 layers=4 27: ParallelTransformerLayerPipe 28: ParallelTransformerLayerPipe 29: ParallelTransformerLayerPipe 30: ParallelTransformerLayerPipe stage=7 layers=8 31: ParallelTransformerLayerPipe 32: ParallelTransformerLayerPipe 33: ParallelTransformerLayerPipe 34: ParallelTransformerLayerPipe 35: 36: MixedFusedLayerNorm 37: EmbeddingPipe 38: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 1986498560 [2021-09-25 04:27:40,518] [INFO] [utils.py:680:see_memory_usage] After Building Model [2021-09-25 04:27:40,519] [INFO] [utils.py:681:see_memory_usage] MA 3.77 GB Max_MA 3.79 GB CA 3.79 GB Max_CA 4 GB [2021-09-25 04:27:40,519] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 36.87 GB, percent = 19.7% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 1986465792 setting training iterations to 159576 > learning rate decay style: cosine DeepSpeed is enabled. [2021-09-25 04:27:40,540] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+bc17042, git-hash=bc17042, git-branch=big-science [2021-09-25 04:27:40,690] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-09-25 04:27:40,690] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-09-25 04:27:40,690] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer [2021-09-25 04:27:40,690] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-09-25 04:27:40,690] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-09-25 04:27:40,690] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-09-25 04:27:40,690] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-09-25 04:27:40,690] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-09-25 04:27:40,691] [INFO] [stage2.py:108:__init__] CPU Offload: False [2021-09-25 04:27:40,691] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False [2021-09-25 04:27:45,267] [INFO] [stage2.py:419:__init__] optimizer state initialized [2021-09-25 04:27:45,267] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-09-25 04:27:45,267] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-09-25 04:27:45,267] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-09-25 04:27:45,267] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-09-25 04:27:45,267] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] amp_enabled .................. False [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] amp_params ................... False [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] disable_allgather ............ False [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] dump_state ................... False [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] global_rank .................. 0 [2021-09-25 04:27:45,268] [INFO] [config.py:904:print] gradient_accumulation_steps .. 256 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] gradient_clipping ............ 1.0 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4096 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] optimizer_name ............... None [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] optimizer_params ............. None [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] pld_enabled .................. False [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] pld_params ................... False [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] scheduler_name ............... None [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] scheduler_params ............. None [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] sparse_attention ............. None [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] steps_per_print .............. 2000 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] train_batch_size ............. 2048 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 1 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] world_size ................... 8 [2021-09-25 04:27:45,269] [INFO] [config.py:904:print] zero_allow_untested_optimizer False [2021-09-25 04:27:45,270] [INFO] [config.py:904:print] zero_config .................. { "stage": 1, "contiguous_gradients": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-09-25 04:27:45,270] [INFO] [config.py:904:print] zero_enabled ................. True [2021-09-25 04:27:45,270] [INFO] [config.py:904:print] zero_optimization_stage ...... 1 [2021-09-25 04:27:45,270] [INFO] [config.py:906:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 2.048000e+03, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-09-25 04:27:45,270] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=256 micro_batch_size=1 [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=129 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=131 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=130 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=128 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=97 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=98 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=96 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=99 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=195 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=193 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=194 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=192 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=65 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=64 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=67 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=66 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=163 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=160 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=162 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=161 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=225 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=226 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=227 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=224 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=32 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=33 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=34 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-25 04:27:45,575] [INFO] [engine.py:134:__init__] RANK=35 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) > using checkpoint value 6e-05 for learning rate > using checkpoint value 6e-06 for minimum learning rate > using checkpoint value 216320 for warmup iterations > using checkpoint value 126953125 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 8 ZeRO state_dicts for rank 48 successfully loaded 8 ZeRO state_dicts for rank 156 successfully loaded 8 ZeRO state_dicts for rank 223 successfully loaded 8 ZeRO state_dicts for rank 220 successfully loaded 8 ZeRO state_dicts for rank 49 successfully loaded 8 ZeRO state_dicts for rank 142 successfully loaded 8 ZeRO state_dicts for rank 221 successfully loaded 8 ZeRO state_dicts for rank 75 successfully loaded 8 ZeRO state_dicts for rank 50 successfully loaded 8 ZeRO state_dicts for rank 158 successfully loaded 8 ZeRO state_dicts for rank 33 successfully loaded 8 ZeRO state_dicts for rank 184 successfully loaded 8 ZeRO state_dicts for rank 216 successfully loaded 8 ZeRO state_dicts for rank 215 successfully loaded 8 ZeRO state_dicts for rank 157 successfully loaded 8 ZeRO state_dicts for rank 141 successfully loaded 8 ZeRO state_dicts for rank 222 successfully loaded 8 ZeRO state_dicts for rank 108 successfully loaded 8 ZeRO state_dicts for rank 58 successfully loaded 8 ZeRO state_dicts for rank 104 successfully loaded 8 ZeRO state_dicts for rank 100 successfully loaded 8 ZeRO state_dicts for rank 213 successfully loaded 8 ZeRO state_dicts for rank 91 successfully loaded 8 ZeRO state_dicts for rank 89 successfully loaded 8 ZeRO state_dicts for rank 40 successfully loaded 8 ZeRO state_dicts for rank 35 successfully loaded 8 ZeRO state_dicts for rank 144 successfully loaded 8 ZeRO state_dicts for rank 140 successfully loaded 8 ZeRO state_dicts for rank 60 successfully loaded 8 ZeRO state_dicts for rank 73 successfully loaded 8 ZeRO state_dicts for rank 51 successfully loaded 8 ZeRO state_dicts for rank 72 successfully loaded 8 ZeRO state_dicts for rank 159 successfully loaded 8 ZeRO state_dicts for rank 212 successfully loaded 8 ZeRO state_dicts for rank 146 successfully loaded 8 ZeRO state_dicts for rank 214 successfully loaded 8 ZeRO state_dicts for rank 143 successfully loaded 8 ZeRO state_dicts for rank 164 successfully loaded 8 ZeRO state_dicts for rank 34 successfully loaded 8 ZeRO state_dicts for rank 52 successfully loaded 8 ZeRO state_dicts for rank 131 successfully loaded 8 ZeRO state_dicts for rank 132 successfully loaded 8 ZeRO state_dicts for rank 45 successfully loaded 8 ZeRO state_dicts for rank 211 successfully loaded 8 ZeRO state_dicts for rank 61 successfully loaded 8 ZeRO state_dicts for rank 154 successfully loaded 8 ZeRO state_dicts for rank 88 successfully loaded 8 ZeRO state_dicts for rank 87 successfully loaded 8 ZeRO state_dicts for rank 90 successfully loaded 8 ZeRO state_dicts for rank 84 successfully loaded 8 ZeRO state_dicts for rank 53 successfully loaded 8 ZeRO state_dicts for rank 116 successfully loaded 8 ZeRO state_dicts for rank 59 successfully loaded 8 ZeRO state_dicts for rank 128 successfully loaded 8 ZeRO state_dicts for rank 165 successfully loaded 8 ZeRO state_dicts for rank 192 successfully loaded 8 ZeRO state_dicts for rank 210 successfully loaded 8 ZeRO state_dicts for rank 129 successfully loaded 8 ZeRO state_dicts for rank 93 successfully loaded 8 ZeRO state_dicts for rank 62 successfully loaded 8 ZeRO state_dicts for rank 81 successfully loaded 8 ZeRO state_dicts for rank 203 successfully loaded 8 ZeRO state_dicts for rank 76 successfully loaded 8 ZeRO state_dicts for rank 127 successfully loaded 8 ZeRO state_dicts for rank 38 successfully loaded 8 ZeRO state_dicts for rank 160 successfully loaded 8 ZeRO state_dicts for rank 113 successfully loaded 8 ZeRO state_dicts for rank 63 successfully loaded 8 ZeRO state_dicts for rank 145 successfully loaded 8 ZeRO state_dicts for rank 36 successfully loaded 8 ZeRO state_dicts for rank 57 successfully loaded 8 ZeRO state_dicts for rank 99 successfully loaded 8 ZeRO state_dicts for rank 67 successfully loaded 8 ZeRO state_dicts for rank 32 successfully loaded 8 ZeRO state_dicts for rank 47 successfully loaded 8 ZeRO state_dicts for rank 147 successfully loaded 8 ZeRO state_dicts for rank 112 successfully loaded 8 ZeRO state_dicts for rank 150 successfully loaded 8 ZeRO state_dicts for rank 178 successfully loaded 8 ZeRO state_dicts for rank 166 successfully loaded 8 ZeRO state_dicts for rank 161 successfully loaded 8 ZeRO state_dicts for rank 219 successfully loaded 8 ZeRO state_dicts for rank 120 successfully loaded 8 ZeRO state_dicts for rank 56 loading 8 zero partition checkpoints for rank 223 successfully loaded 8 ZeRO state_dicts for rank 54 successfully loaded 8 ZeRO state_dicts for rank 130 successfully loaded 8 ZeRO state_dicts for rank 79 successfully loaded 8 ZeRO state_dicts for rank 218 successfully loaded 8 ZeRO state_dicts for rank 65 successfully loaded 8 ZeRO state_dicts for rank 115 successfully loaded 8 ZeRO state_dicts for rank 85 loading 8 zero partition checkpoints for rank 156 successfully loaded 8 ZeRO state_dicts for rank 109 successfully loaded 8 ZeRO state_dicts for rank 209 successfully loaded 8 ZeRO state_dicts for rank 152 successfully loaded 8 ZeRO state_dicts for rank 83 successfully loaded 8 ZeRO state_dicts for rank 103 successfully loaded 8 ZeRO state_dicts for rank 66 successfully loaded 8 ZeRO state_dicts for rank 44 successfully loaded 8 ZeRO state_dicts for rank 74 successfully loaded 8 ZeRO state_dicts for rank 96 successfully loaded 8 ZeRO state_dicts for rank 86 successfully loaded 8 ZeRO state_dicts for rank 151 successfully loaded 8 ZeRO state_dicts for rank 171 successfully loaded 8 ZeRO state_dicts for rank 135 successfully loaded 8 ZeRO state_dicts for rank 14 successfully loaded 8 ZeRO state_dicts for rank 64 successfully loaded 8 ZeRO state_dicts for rank 196 successfully loaded 8 ZeRO state_dicts for rank 123 successfully loaded 8 ZeRO state_dicts for rank 136 successfully loaded 8 ZeRO state_dicts for rank 181 successfully loaded 8 ZeRO state_dicts for rank 55 successfully loaded 8 ZeRO state_dicts for rank 228 loading 8 zero partition checkpoints for rank 48 successfully loaded 8 ZeRO state_dicts for rank 124 successfully loaded 8 ZeRO state_dicts for rank 170 successfully loaded 8 ZeRO state_dicts for rank 208 successfully loaded 8 ZeRO state_dicts for rank 105 successfully loaded 8 ZeRO state_dicts for rank 95 successfully loaded 8 ZeRO state_dicts for rank 134 successfully loaded 8 ZeRO state_dicts for rank 153 successfully loaded 8 ZeRO state_dicts for rank 204 successfully loaded 8 ZeRO state_dicts for rank 125 successfully loaded 8 ZeRO state_dicts for rank 111 successfully loaded 8 ZeRO state_dicts for rank 133 successfully loaded 8 ZeRO state_dicts for rank 149 successfully loaded 8 ZeRO state_dicts for rank 194 successfully loaded 8 ZeRO state_dicts for rank 148 successfully loaded 8 ZeRO state_dicts for rank 217 successfully loaded 8 ZeRO state_dicts for rank 206 successfully loaded 8 ZeRO state_dicts for rank 114 successfully loaded 8 ZeRO state_dicts for rank 200 loading 8 zero partition checkpoints for rank 220 successfully loaded 8 ZeRO state_dicts for rank 202 successfully loaded 8 ZeRO state_dicts for rank 138 successfully loaded 8 ZeRO state_dicts for rank 139 successfully loaded 8 ZeRO state_dicts for rank 37 successfully loaded 8 ZeRO state_dicts for rank 176 successfully loaded 8 ZeRO state_dicts for rank 168 successfully loaded 8 ZeRO state_dicts for rank 98 successfully loaded 8 ZeRO state_dicts for rank 101 successfully loaded 8 ZeRO state_dicts for rank 39 successfully loaded 8 ZeRO state_dicts for rank 107 successfully loaded 8 ZeRO state_dicts for rank 42 successfully loaded 8 ZeRO state_dicts for rank 8 successfully loaded 8 ZeRO state_dicts for rank 186 successfully loaded 8 ZeRO state_dicts for rank 94 loading 8 zero partition checkpoints for rank 142 successfully loaded 8 ZeRO state_dicts for rank 77 successfully loaded 8 ZeRO state_dicts for rank 137 successfully loaded 8 ZeRO state_dicts for rank 207 successfully loaded 8 ZeRO state_dicts for rank 172 successfully loaded 8 ZeRO state_dicts for rank 199 successfully loaded 8 ZeRO state_dicts for rank 43 successfully loaded 8 ZeRO state_dicts for rank 69 successfully loaded 8 ZeRO state_dicts for rank 205 successfully loaded 8 ZeRO state_dicts for rank 167 successfully loaded 8 ZeRO state_dicts for rank 41 successfully loaded 8 ZeRO state_dicts for rank 80 successfully loaded 8 ZeRO state_dicts for rank 119 successfully loaded 8 ZeRO state_dicts for rank 106 successfully loaded 8 ZeRO state_dicts for rank 187 successfully loaded 8 ZeRO state_dicts for rank 197 successfully loaded 8 ZeRO state_dicts for rank 92 successfully loaded 8 ZeRO state_dicts for rank 236 successfully loaded 8 ZeRO state_dicts for rank 97 successfully loaded 8 ZeRO state_dicts for rank 155 successfully loaded 8 ZeRO state_dicts for rank 82 successfully loaded 8 ZeRO state_dicts for rank 185 successfully loaded 8 ZeRO state_dicts for rank 78 successfully loaded 8 ZeRO state_dicts for rank 10 successfully loaded 8 ZeRO state_dicts for rank 71 successfully loaded 8 ZeRO state_dicts for rank 68 successfully loaded 8 ZeRO state_dicts for rank 195 successfully loaded 8 ZeRO state_dicts for rank 102 successfully loaded 8 ZeRO state_dicts for rank 70 successfully loaded 8 ZeRO state_dicts for rank 26 successfully loaded 8 ZeRO state_dicts for rank 180 successfully loaded 8 ZeRO state_dicts for rank 117 loading 8 zero partition checkpoints for rank 75 successfully loaded 8 ZeRO state_dicts for rank 121 successfully loaded 8 ZeRO state_dicts for rank 174 successfully loaded 8 ZeRO state_dicts for rank 24 loading 8 zero partition checkpoints for rank 50 successfully loaded 8 ZeRO state_dicts for rank 179 successfully loaded 8 ZeRO state_dicts for rank 248 successfully loaded 8 ZeRO state_dicts for rank 46 successfully loaded 8 ZeRO state_dicts for rank 12 successfully loaded 8 ZeRO state_dicts for rank 126 successfully loaded 8 ZeRO state_dicts for rank 169 loading 8 zero partition checkpoints for rank 216 loading 8 zero partition checkpoints for rank 215 successfully loaded 8 ZeRO state_dicts for rank 11 successfully loaded 8 ZeRO state_dicts for rank 183 successfully loaded 8 ZeRO state_dicts for rank 162 loading 8 zero partition checkpoints for rank 222 loading 8 zero partition checkpoints for rank 108 successfully loaded 8 ZeRO state_dicts for rank 182 successfully loaded 8 ZeRO state_dicts for rank 27 successfully loaded 8 ZeRO state_dicts for rank 252 successfully loaded 8 ZeRO state_dicts for rank 224 successfully loaded 8 ZeRO state_dicts for rank 201 successfully loaded 8 ZeRO state_dicts for rank 240 successfully loaded 8 ZeRO state_dicts for rank 190 loading 8 zero partition checkpoints for rank 141 loading 8 zero partition checkpoints for rank 221 successfully loaded 8 ZeRO state_dicts for rank 193 successfully loaded 8 ZeRO state_dicts for rank 231 successfully loaded 8 ZeRO state_dicts for rank 175 successfully loaded 8 ZeRO state_dicts for rank 122 successfully loaded 8 ZeRO state_dicts for rank 13 loading 8 zero partition checkpoints for rank 157 successfully loaded 8 ZeRO state_dicts for rank 110 successfully loaded 8 ZeRO state_dicts for rank 233 successfully loaded 8 ZeRO state_dicts for rank 118 loading 8 zero partition checkpoints for rank 184 successfully loaded 8 ZeRO state_dicts for rank 198 successfully loaded 8 ZeRO state_dicts for rank 30 successfully loaded 8 ZeRO state_dicts for rank 163 successfully loaded 8 ZeRO state_dicts for rank 244 successfully loaded 8 ZeRO state_dicts for rank 16 successfully loaded 8 ZeRO state_dicts for rank 18 successfully loaded 8 ZeRO state_dicts for rank 250 successfully loaded 8 ZeRO state_dicts for rank 2 successfully loaded 8 ZeRO state_dicts for rank 25 successfully loaded 8 ZeRO state_dicts for rank 230 successfully loaded 8 ZeRO state_dicts for rank 235 successfully loaded 8 ZeRO state_dicts for rank 31 successfully loaded 8 ZeRO state_dicts for rank 177 successfully loaded 8 ZeRO state_dicts for rank 28 successfully loaded 8 ZeRO state_dicts for rank 238 loading 8 zero partition checkpoints for rank 60 loading 8 zero partition checkpoints for rank 144 loading 8 zero partition checkpoints for rank 104 loading 8 zero partition checkpoints for rank 213 loading 8 zero partition checkpoints for rank 89 loading 8 zero partition checkpoints for rank 40 successfully loaded 8 ZeRO state_dicts for rank 239 loading 8 zero partition checkpoints for rank 140 successfully loaded 8 ZeRO state_dicts for rank 191 loading 8 zero partition checkpoints for rank 91 loading 8 zero partition checkpoints for rank 100 successfully loaded 8 ZeRO state_dicts for rank 173 successfully loaded 8 ZeRO state_dicts for rank 232 successfully loaded 8 ZeRO state_dicts for rank 22 loading 8 zero partition checkpoints for rank 52 successfully loaded 8 ZeRO state_dicts for rank 188 successfully loaded 8 ZeRO state_dicts for rank 249 successfully loaded 8 ZeRO state_dicts for rank 189 successfully loaded 8 ZeRO state_dicts for rank 237 successfully loaded 8 ZeRO state_dicts for rank 253 successfully loaded 8 ZeRO state_dicts for rank 229 successfully loaded 8 ZeRO state_dicts for rank 29 successfully loaded 8 ZeRO state_dicts for rank 226 successfully loaded 8 ZeRO state_dicts for rank 251 loading 8 zero partition checkpoints for rank 212 successfully loaded 8 ZeRO state_dicts for rank 17 successfully loaded 8 ZeRO state_dicts for rank 241 loading 8 zero partition checkpoints for rank 214 successfully loaded 8 ZeRO state_dicts for rank 9 successfully loaded 8 ZeRO state_dicts for rank 255 successfully loaded 8 ZeRO state_dicts for rank 15 successfully loaded 8 ZeRO state_dicts for rank 245 loading 8 zero partition checkpoints for rank 211 successfully loaded 8 ZeRO state_dicts for rank 246 loading 8 zero partition checkpoints for rank 87 successfully loaded 8 ZeRO state_dicts for rank 242 successfully loaded 8 ZeRO state_dicts for rank 227 successfully loaded 8 ZeRO state_dicts for rank 243 successfully loaded 8 ZeRO state_dicts for rank 247 loading 8 zero partition checkpoints for rank 143 successfully loaded 8 ZeRO state_dicts for rank 19 loading 8 zero partition checkpoints for rank 116 loading 8 zero partition checkpoints for rank 132 loading 8 zero partition checkpoints for rank 88 successfully loaded 8 ZeRO state_dicts for rank 20 loading 8 zero partition checkpoints for rank 49 loading 8 zero partition checkpoints for rank 128 loading 8 zero partition checkpoints for rank 154 loading 8 zero partition checkpoints for rank 165 loading 8 zero partition checkpoints for rank 62 successfully loaded 8 ZeRO state_dicts for rank 254 loading 8 zero partition checkpoints for rank 93 successfully loaded 8 ZeRO state_dicts for rank 225 loading 8 zero partition checkpoints for rank 81 loading 8 zero partition checkpoints for rank 127 loading 8 zero partition checkpoints for rank 76 loading 8 zero partition checkpoints for rank 99 loading 8 zero partition checkpoints for rank 57 successfully loaded 8 ZeRO state_dicts for rank 0 loading 8 zero partition checkpoints for rank 90 loading 8 zero partition checkpoints for rank 73 successfully loaded 8 ZeRO state_dicts for rank 1 successfully loaded 8 ZeRO state_dicts for rank 234 loading 8 zero partition checkpoints for rank 166 successfully loaded 8 ZeRO state_dicts for rank 3 loading 8 zero partition checkpoints for rank 84 loading 8 zero partition checkpoints for rank 113 loading 8 zero partition checkpoints for rank 147 loading 8 zero partition checkpoints for rank 219 loading 8 zero partition checkpoints for rank 51 loading 8 zero partition checkpoints for rank 72 loading 8 zero partition checkpoints for rank 58 loading 8 zero partition checkpoints for rank 160 loading 8 zero partition checkpoints for rank 56 loading 8 zero partition checkpoints for rank 158 loading 8 zero partition checkpoints for rank 65 loading 8 zero partition checkpoints for rank 130 loading 8 zero partition checkpoints for rank 115 loading 8 zero partition checkpoints for rank 67 successfully loaded 8 ZeRO state_dicts for rank 21 loading 8 zero partition checkpoints for rank 209 loading 8 zero partition checkpoints for rank 109 loading 8 zero partition checkpoints for rank 44 loading 8 zero partition checkpoints for rank 74 loading 8 zero partition checkpoints for rank 86 loading 8 zero partition checkpoints for rank 45 loading 8 zero partition checkpoints for rank 83 loading 8 zero partition checkpoints for rank 171 loading 8 zero partition checkpoints for rank 136 successfully loaded 8 ZeRO state_dicts for rank 23 loading 8 zero partition checkpoints for rank 218 loading 8 zero partition checkpoints for rank 159 loading 8 zero partition checkpoints for rank 196 loading 8 zero partition checkpoints for rank 66 loading 8 zero partition checkpoints for rank 125 loading 8 zero partition checkpoints for rank 111 loading 8 zero partition checkpoints for rank 181 loading 8 zero partition checkpoints for rank 151 loading 8 zero partition checkpoints for rank 64 loading 8 zero partition checkpoints for rank 134 loading 8 zero partition checkpoints for rank 85 loading 8 zero partition checkpoints for rank 206 loading 8 zero partition checkpoints for rank 120 loading 8 zero partition checkpoints for rank 37 loading 8 zero partition checkpoints for rank 146 loading 8 zero partition checkpoints for rank 95 loading 8 zero partition checkpoints for rank 194 loading 8 zero partition checkpoints for rank 202 loading 8 zero partition checkpoints for rank 178 loading 8 zero partition checkpoints for rank 138 loading 8 zero partition checkpoints for rank 170 loading 8 zero partition checkpoints for rank 55 loading 8 zero partition checkpoints for rank 61 loading 8 zero partition checkpoints for rank 101 loading 8 zero partition checkpoints for rank 124 loading 8 zero partition checkpoints for rank 135 loading 8 zero partition checkpoints for rank 148 loading 8 zero partition checkpoints for rank 139 loading 8 zero partition checkpoints for rank 14 loading 8 zero partition checkpoints for rank 77 loading 8 zero partition checkpoints for rank 39 loading 8 zero partition checkpoints for rank 152 loading 8 zero partition checkpoints for rank 59 loading 8 zero partition checkpoints for rank 80 loading 8 zero partition checkpoints for rank 106 loading 8 zero partition checkpoints for rank 69 loading 8 zero partition checkpoints for rank 79 loading 8 zero partition checkpoints for rank 47 loading 8 zero partition checkpoints for rank 203 loading 8 zero partition checkpoints for rank 94 loading 8 zero partition checkpoints for rank 186 loading 8 zero partition checkpoints for rank 217 loading 8 zero partition checkpoints for rank 97 loading 8 zero partition checkpoints for rank 92 loading 8 zero partition checkpoints for rank 71 loading 8 zero partition checkpoints for rank 164 loading 8 zero partition checkpoints for rank 41 loading 8 zero partition checkpoints for rank 103 loading 8 zero partition checkpoints for rank 131 loading 8 zero partition checkpoints for rank 197 loading 8 zero partition checkpoints for rank 112 loading 8 zero partition checkpoints for rank 145 loading 8 zero partition checkpoints for rank 180 loading 8 zero partition checkpoints for rank 70 loading 8 zero partition checkpoints for rank 63 loading 8 zero partition checkpoints for rank 123 loading 8 zero partition checkpoints for rank 137 loading 8 zero partition checkpoints for rank 82 loading 8 zero partition checkpoints for rank 150 loading 8 zero partition checkpoints for rank 68 loading 8 zero partition checkpoints for rank 228 loading 8 zero partition checkpoints for rank 187 loading 8 zero partition checkpoints for rank 205 loading 8 zero partition checkpoints for rank 8 loading 8 zero partition checkpoints for rank 46 loading 8 zero partition checkpoints for rank 117 loading 8 zero partition checkpoints for rank 185 loading 8 zero partition checkpoints for rank 183 loading 8 zero partition checkpoints for rank 168 loading 8 zero partition checkpoints for rank 133 loading 8 zero partition checkpoints for rank 155 loading 8 zero partition checkpoints for rank 176 loading 8 zero partition checkpoints for rank 119 loading 8 zero partition checkpoints for rank 153 loading 8 zero partition checkpoints for rank 121 loading 8 zero partition checkpoints for rank 42 loading 8 zero partition checkpoints for rank 102 loading 8 zero partition checkpoints for rank 96 loading 8 zero partition checkpoints for rank 236 loading 8 zero partition checkpoints for rank 201 loading 8 zero partition checkpoints for rank 179 loading 8 zero partition checkpoints for rank 162 loading 8 zero partition checkpoints for rank 182 loading 8 zero partition checkpoints for rank 43 loading 8 zero partition checkpoints for rank 107 loading 8 zero partition checkpoints for rank 129 loading 8 zero partition checkpoints for rank 110 loading 8 zero partition checkpoints for rank 38 loading 8 zero partition checkpoints for rank 126 loading 8 zero partition checkpoints for rank 105 loading 8 zero partition checkpoints for rank 193 loading 8 zero partition checkpoints for rank 118 loading 8 zero partition checkpoints for rank 248 loading 8 zero partition checkpoints for rank 114 loading 8 zero partition checkpoints for rank 122 loading 8 zero partition checkpoints for rank 200 loading 8 zero partition checkpoints for rank 33 loading 8 zero partition checkpoints for rank 177 loading 8 zero partition checkpoints for rank 149 loading 8 zero partition checkpoints for rank 36 loading 8 zero partition checkpoints for rank 233 loading 8 zero partition checkpoints for rank 53 loading 8 zero partition checkpoints for rank 161 loading 8 zero partition checkpoints for rank 12 loading 8 zero partition checkpoints for rank 244 loading 8 zero partition checkpoints for rank 78 loading 8 zero partition checkpoints for rank 30 loading 8 zero partition checkpoints for rank 98 loading 8 zero partition checkpoints for rank 204 loading 8 zero partition checkpoints for rank 16 loading 8 zero partition checkpoints for rank 169 loading 8 zero partition checkpoints for rank 28 loading 8 zero partition checkpoints for rank 199 loading 8 zero partition checkpoints for rank 230 loading 8 zero partition checkpoints for rank 224 loading 8 zero partition checkpoints for rank 35 loading 8 zero partition checkpoints for rank 240 loading 8 zero partition checkpoints for rank 167 loading 8 zero partition checkpoints for rank 54 loading 8 zero partition checkpoints for rank 210 loading 8 zero partition checkpoints for rank 27 loading 8 zero partition checkpoints for rank 10 loading 8 zero partition checkpoints for rank 190 loading 8 zero partition checkpoints for rank 192 loading 8 zero partition checkpoints for rank 34 loading 8 zero partition checkpoints for rank 252 loading 8 zero partition checkpoints for rank 163 loading 8 zero partition checkpoints for rank 13 loading 8 zero partition checkpoints for rank 207 loading 8 zero partition checkpoints for rank 191 loading 8 zero partition checkpoints for rank 32 loading 8 zero partition checkpoints for rank 231 loading 8 zero partition checkpoints for rank 26 loading 8 zero partition checkpoints for rank 9 loading 8 zero partition checkpoints for rank 255 loading 8 zero partition checkpoints for rank 11 loading 8 zero partition checkpoints for rank 175 loading 8 zero partition checkpoints for rank 241 loading 8 zero partition checkpoints for rank 25 loading 8 zero partition checkpoints for rank 189 loading 8 zero partition checkpoints for rank 17 loading 8 zero partition checkpoints for rank 24 loading 8 zero partition checkpoints for rank 245 loading 8 zero partition checkpoints for rank 208 loading 8 zero partition checkpoints for rank 198 loading 8 zero partition checkpoints for rank 254 loading 8 zero partition checkpoints for rank 237 loading 8 zero partition checkpoints for rank 188 loading 8 zero partition checkpoints for rank 251 loading 8 zero partition checkpoints for rank 225 loading 8 zero partition checkpoints for rank 0 checkpoint version 3.0 loading 8 zero partition checkpoints for rank 253 loading 8 zero partition checkpoints for rank 229 loading 8 zero partition checkpoints for rank 250 loading 8 zero partition checkpoints for rank 195 loading 8 zero partition checkpoints for rank 173 loading 8 zero partition checkpoints for rank 1 loading 8 zero partition checkpoints for rank 234 loading 8 zero partition checkpoints for rank 15 loading 8 zero partition checkpoints for rank 239 loading 8 zero partition checkpoints for rank 247 loading 8 zero partition checkpoints for rank 246 loading 8 zero partition checkpoints for rank 172 loading 8 zero partition checkpoints for rank 249 loading 8 zero partition checkpoints for rank 238 loading 8 zero partition checkpoints for rank 31 loading 8 zero partition checkpoints for rank 243 loading 8 zero partition checkpoints for rank 242 loading 8 zero partition checkpoints for rank 174 loading 8 zero partition checkpoints for rank 226 loading 8 zero partition checkpoints for rank 29 loading 8 zero partition checkpoints for rank 18 loading 8 zero partition checkpoints for rank 227 loading 8 zero partition checkpoints for rank 19 loading 8 zero partition checkpoints for rank 2 loading 8 zero partition checkpoints for rank 235 loading 8 zero partition checkpoints for rank 232 loading 8 zero partition checkpoints for rank 3 loading 8 zero partition checkpoints for rank 22 loading 8 zero partition checkpoints for rank 20 loading 8 zero partition checkpoints for rank 21 loading 8 zero partition checkpoints for rank 23 successfully loaded 8 ZeRO state_dicts for rank 6 loading 8 zero partition checkpoints for rank 6 successfully loaded 8 ZeRO state_dicts for rank 7 successfully loaded 8 ZeRO state_dicts for rank 4 successfully loaded 8 ZeRO state_dicts for rank 5 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-25 04:30:09 CEST)" was missed by 0:00:03.764782 loading 8 zero partition checkpoints for rank 4 loading 8 zero partition checkpoints for rank 7 loading 8 zero partition checkpoints for rank 5 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints at iteration 6210 time (ms) | load-checkpoint: 91691.46 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-25 04:29:17 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 300000000 validation: 1638400 test: 10240 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.138486 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.350 seconds total number of samples: 394611670 total number of epochs: 3 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.276 seconds total number of samples: 6927161 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.080 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-09-25 04:29:23 done with setup ... training ... time (ms) | model-and-optimizer-setup: 99723.96 | train/valid/test-data-iterators-setup: 5641.98 [before the start of training step] datetime: 2021-09-25 04:29:23 [2021-09-25 04:29:23,929] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information [2021-09-25 04:29:23,930] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-09-25 04:29:23,930] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 32 total layers [2021-09-25 04:29:23,930] [INFO] [checkpointing.py:415:forward] ----Synchronization False [2021-09-25 04:29:23,930] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False [Rank 1] (after 6220 iterations) memory (MB) | allocated: 6689.83056640625 | max allocated: 13899.01416015625 | reserved: 23406.0 | max reserved: 23406.0 [Rank 225] (after 6220 iterations) memory (MB) | allocated: 7107.7119140625 | max allocated: 11885.68994140625 | reserved: 21700.0 | max reserved: 21700.0 [Rank 226] (after 6220 iterations) memory (MB) | allocated: 7107.7119140625 | max allocated: 11885.6884765625 | reserved: 22492.0 | max reserved: 22492.0 [Rank 2] (after 6220 iterations) memory (MB) | allocated: 6689.83056640625 | max allocated: 13899.01416015625 | reserved: 23406.0 | max reserved: 23406.0 [Rank 0] (after 6220 iterations) memory (MB) | allocated: 6689.83056640625 | max allocated: 13899.01416015625 | reserved: 23726.0 | max reserved: 23726.0 [Rank 224] (after 6220 iterations) memory (MB) | allocated: 7107.7119140625 | max allocated: 11885.68896484375 | reserved: 22492.0 | max reserved: 22492.0 [Rank 3] (after 6220 iterations) memory (MB) | allocated: 6689.83056640625 | max allocated: 13899.01416015625 | reserved: 23374.0 | max reserved: 23374.0 [Rank 227] (after 6220 iterations) memory (MB) | allocated: 7107.7119140625 | max allocated: 11885.68994140625 | reserved: 22492.0 | max reserved: 22492.0 iteration 6220/ 159576 | consumed samples: 194400 | elapsed time per iteration (ms): 18925.1 | learning rate: 5.378E-05 | global batch size: 80 | lm loss: 6.332304E+00 | loss scale: 4096.0 | grad norm: 207900.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [Rank 33] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20130.0 | max reserved: 20130.0 [Rank 97] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19402.0 | max reserved: 19402.0 [Rank 161] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 18826.0 | max reserved: 18826.0 [Rank 193] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 18826.0 | max reserved: 18826.0 [Rank 129] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19662.0 | max reserved: 19662.0 [Rank 65] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 19946.0 | max reserved: 19946.0 [Rank 34] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20170.0 | max reserved: 20170.0 [Rank 162] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 18826.0 | max reserved: 18826.0 [Rank 130] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19390.0 | max reserved: 19390.0 [Rank 98] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19722.0 | max reserved: 19722.0 [Rank 194] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 18826.0 | max reserved: 18826.0 [Rank 66] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 20094.0 | max reserved: 20094.0 [Rank 32] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20456.0 | max reserved: 20456.0 [Rank 128] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19908.0 | max reserved: 19908.0 [Rank 96] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19828.0 | max reserved: 19828.0 [Rank 64] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 20328.0 | max reserved: 20328.0 [Rank 192] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 19396.0 | max reserved: 19396.0 [Rank 160] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 19572.0 | max reserved: 19572.0 [Rank 99] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19662.0 | max reserved: 19662.0 [Rank 67] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 19966.0 | max reserved: 19966.0 [Rank 131] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19578.0 | max reserved: 19578.0 [Rank 35] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20078.0 | max reserved: 20078.0 [Rank 195] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 18842.0 | max reserved: 18842.0 [Rank 163] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 19066.0 | max reserved: 19066.0 iteration 6230/ 159576 | consumed samples: 195200 | elapsed time per iteration (ms): 17419.3 | learning rate: 5.400E-05 | global batch size: 80 | lm loss: 6.312761E+00 | loss scale: 4096.0 | grad norm: 102010.658 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6240/ 159576 | consumed samples: 196000 | elapsed time per iteration (ms): 17458.3 | learning rate: 5.423E-05 | global batch size: 80 | lm loss: 6.325917E+00 | loss scale: 4096.0 | grad norm: 139671.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6250/ 159576 | consumed samples: 196800 | elapsed time per iteration (ms): 17438.0 | learning rate: 5.445E-05 | global batch size: 80 | lm loss: 6.330989E+00 | loss scale: 4096.0 | grad norm: 117429.787 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6260/ 159576 | consumed samples: 197600 | elapsed time per iteration (ms): 17495.4 | learning rate: 5.467E-05 | global batch size: 80 | lm loss: 6.330341E+00 | loss scale: 4096.0 | grad norm: 101380.992 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6270/ 159576 | consumed samples: 198400 | elapsed time per iteration (ms): 17488.9 | learning rate: 5.489E-05 | global batch size: 80 | lm loss: 6.304220E+00 | loss scale: 4096.0 | grad norm: 137994.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6280/ 159576 | consumed samples: 199200 | elapsed time per iteration (ms): 17456.9 | learning rate: 5.511E-05 | global batch size: 80 | lm loss: 6.302861E+00 | loss scale: 4096.0 | grad norm: 117645.788 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6290/ 159576 | consumed samples: 200000 | elapsed time per iteration (ms): 16818.4 | learning rate: 5.531E-05 | global batch size: 80 | lm loss: 6.313686E+00 | loss scale: 4096.0 | grad norm: 87880.797 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6300/ 159576 | consumed samples: 200800 | elapsed time per iteration (ms): 17519.8 | learning rate: 5.554E-05 | global batch size: 80 | lm loss: 6.270583E+00 | loss scale: 4096.0 | grad norm: 86063.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6310/ 159576 | consumed samples: 201600 | elapsed time per iteration (ms): 17461.4 | learning rate: 5.576E-05 | global batch size: 80 | lm loss: 6.315401E+00 | loss scale: 4096.0 | grad norm: 120394.115 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6320/ 159576 | consumed samples: 202400 | elapsed time per iteration (ms): 17455.8 | learning rate: 5.598E-05 | global batch size: 80 | lm loss: 6.326277E+00 | loss scale: 4096.0 | grad norm: 95784.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6330/ 159576 | consumed samples: 203200 | elapsed time per iteration (ms): 17431.8 | learning rate: 5.620E-05 | global batch size: 80 | lm loss: 6.333566E+00 | loss scale: 4096.0 | grad norm: 119951.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6340/ 159576 | consumed samples: 204000 | elapsed time per iteration (ms): 16668.3 | learning rate: 5.640E-05 | global batch size: 80 | lm loss: 6.321040E+00 | loss scale: 2048.0 | grad norm: 54351.143 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 05:08:29] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 05:08:29] PULSE: tr8-104B is running for 41:28 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 6350/ 159576 | consumed samples: 204800 | elapsed time per iteration (ms): 17330.6 | learning rate: 5.662E-05 | global batch size: 80 | lm loss: 6.297153E+00 | loss scale: 2048.0 | grad norm: 61555.753 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6360/ 159576 | consumed samples: 205600 | elapsed time per iteration (ms): 17390.9 | learning rate: 5.684E-05 | global batch size: 80 | lm loss: 6.296333E+00 | loss scale: 2048.0 | grad norm: 67211.747 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6370/ 159576 | consumed samples: 206400 | elapsed time per iteration (ms): 17338.2 | learning rate: 5.707E-05 | global batch size: 80 | lm loss: 6.309451E+00 | loss scale: 2048.0 | grad norm: 66671.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6380/ 159576 | consumed samples: 207200 | elapsed time per iteration (ms): 17380.7 | learning rate: 5.729E-05 | global batch size: 80 | lm loss: 6.301356E+00 | loss scale: 2048.0 | grad norm: 45299.990 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6390/ 159576 | consumed samples: 208000 | elapsed time per iteration (ms): 17366.7 | learning rate: 5.751E-05 | global batch size: 80 | lm loss: 6.335297E+00 | loss scale: 2048.0 | grad norm: 59836.646 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6400/ 159576 | consumed samples: 208800 | elapsed time per iteration (ms): 17383.7 | learning rate: 5.773E-05 | global batch size: 80 | lm loss: 6.303946E+00 | loss scale: 2048.0 | grad norm: 55594.564 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6410/ 159576 | consumed samples: 209600 | elapsed time per iteration (ms): 17402.0 | learning rate: 5.795E-05 | global batch size: 80 | lm loss: 6.335719E+00 | loss scale: 2048.0 | grad norm: 63504.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6420/ 159576 | consumed samples: 210400 | elapsed time per iteration (ms): 17371.7 | learning rate: 5.818E-05 | global batch size: 80 | lm loss: 6.278386E+00 | loss scale: 2048.0 | grad norm: 252963.122 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6430/ 159576 | consumed samples: 211200 | elapsed time per iteration (ms): 17394.4 | learning rate: 5.840E-05 | global batch size: 80 | lm loss: 6.309026E+00 | loss scale: 2048.0 | grad norm: 70987.021 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6440/ 159576 | consumed samples: 212000 | elapsed time per iteration (ms): 17385.8 | learning rate: 5.862E-05 | global batch size: 80 | lm loss: 6.352011E+00 | loss scale: 2048.0 | grad norm: 57730.647 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6450/ 159576 | consumed samples: 212800 | elapsed time per iteration (ms): 17363.4 | learning rate: 5.884E-05 | global batch size: 80 | lm loss: 6.338916E+00 | loss scale: 2048.0 | grad norm: 74089.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6460/ 159576 | consumed samples: 213600 | elapsed time per iteration (ms): 17402.1 | learning rate: 5.906E-05 | global batch size: 80 | lm loss: 6.307239E+00 | loss scale: 2048.0 | grad norm: 43748.712 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6470/ 159576 | consumed samples: 214400 | elapsed time per iteration (ms): 17495.0 | learning rate: 5.929E-05 | global batch size: 80 | lm loss: 6.336151E+00 | loss scale: 2048.0 | grad norm: 39508.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6480/ 159576 | consumed samples: 215200 | elapsed time per iteration (ms): 17462.6 | learning rate: 5.951E-05 | global batch size: 80 | lm loss: 6.356039E+00 | loss scale: 2048.0 | grad norm: 37602.564 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6490/ 159576 | consumed samples: 216000 | elapsed time per iteration (ms): 17419.0 | learning rate: 5.973E-05 | global batch size: 80 | lm loss: 6.355389E+00 | loss scale: 2048.0 | grad norm: 44833.008 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6500/ 159576 | consumed samples: 216800 | elapsed time per iteration (ms): 17489.2 | learning rate: 5.995E-05 | global batch size: 80 | lm loss: 6.336482E+00 | loss scale: 2048.0 | grad norm: 54162.793 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6510/ 159576 | consumed samples: 217600 | elapsed time per iteration (ms): 17458.7 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.337574E+00 | loss scale: 2048.0 | grad norm: 54595.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6520/ 159576 | consumed samples: 218400 | elapsed time per iteration (ms): 17515.2 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.356417E+00 | loss scale: 2048.0 | grad norm: 49879.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6530/ 159576 | consumed samples: 219200 | elapsed time per iteration (ms): 17447.6 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.369381E+00 | loss scale: 2048.0 | grad norm: 60963.731 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6540/ 159576 | consumed samples: 220000 | elapsed time per iteration (ms): 17448.8 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.338880E+00 | loss scale: 2048.0 | grad norm: 59382.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6550/ 159576 | consumed samples: 220800 | elapsed time per iteration (ms): 17544.1 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.331310E+00 | loss scale: 2048.0 | grad norm: 62265.638 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 06:08:34] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 06:08:34] PULSE: tr8-104B is running for 1:41:33 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 6560/ 159576 | consumed samples: 221600 | elapsed time per iteration (ms): 17470.3 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.312242E+00 | loss scale: 2048.0 | grad norm: 58830.808 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6570/ 159576 | consumed samples: 222400 | elapsed time per iteration (ms): 17497.8 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.305868E+00 | loss scale: 2048.0 | grad norm: 95845.470 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6580/ 159576 | consumed samples: 223200 | elapsed time per iteration (ms): 17465.4 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.323441E+00 | loss scale: 2048.0 | grad norm: 67257.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6590/ 159576 | consumed samples: 224000 | elapsed time per iteration (ms): 17539.4 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.324122E+00 | loss scale: 2048.0 | grad norm: 68019.685 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6600/ 159576 | consumed samples: 224800 | elapsed time per iteration (ms): 17523.7 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.367977E+00 | loss scale: 2048.0 | grad norm: 72056.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6610/ 159576 | consumed samples: 225600 | elapsed time per iteration (ms): 17492.9 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.308113E+00 | loss scale: 2048.0 | grad norm: 149731.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6620/ 159576 | consumed samples: 226400 | elapsed time per iteration (ms): 17537.3 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.354418E+00 | loss scale: 2048.0 | grad norm: 62412.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6630/ 159576 | consumed samples: 227200 | elapsed time per iteration (ms): 17517.5 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.357222E+00 | loss scale: 2048.0 | grad norm: 85289.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6640/ 159576 | consumed samples: 228000 | elapsed time per iteration (ms): 17515.1 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.340989E+00 | loss scale: 2048.0 | grad norm: 56974.928 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6650/ 159576 | consumed samples: 228800 | elapsed time per iteration (ms): 17504.4 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.343948E+00 | loss scale: 2048.0 | grad norm: 94205.551 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6660/ 159576 | consumed samples: 229600 | elapsed time per iteration (ms): 17528.5 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.349052E+00 | loss scale: 2048.0 | grad norm: 59116.810 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6670/ 159576 | consumed samples: 230400 | elapsed time per iteration (ms): 17539.0 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.319823E+00 | loss scale: 2048.0 | grad norm: 89145.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6680/ 159576 | consumed samples: 231200 | elapsed time per iteration (ms): 17492.6 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.322467E+00 | loss scale: 2048.0 | grad norm: 79513.773 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6690/ 159576 | consumed samples: 232000 | elapsed time per iteration (ms): 17427.8 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.351400E+00 | loss scale: 2048.0 | grad norm: 80270.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6700/ 159576 | consumed samples: 232800 | elapsed time per iteration (ms): 17427.9 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.321815E+00 | loss scale: 2048.0 | grad norm: 89875.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6710/ 159576 | consumed samples: 233600 | elapsed time per iteration (ms): 17478.2 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.318744E+00 | loss scale: 2048.0 | grad norm: 75317.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 06:55:50] PULSE: tr8-104B is scheduled to start in 1 day, 10:16:13 (at 2021-09-26T17:12:04) (1188168 on 'gpu_p13' partition) [2021-09-25 06:55:50] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 06:55:50] PULSE: tr8-104B is running for 2:28:49 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 6720/ 159576 | consumed samples: 234400 | elapsed time per iteration (ms): 17509.5 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.297193E+00 | loss scale: 2048.0 | grad norm: 136372.702 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6730/ 159576 | consumed samples: 235200 | elapsed time per iteration (ms): 17514.2 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.303332E+00 | loss scale: 2048.0 | grad norm: 84302.661 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6740/ 159576 | consumed samples: 236000 | elapsed time per iteration (ms): 17530.2 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.327809E+00 | loss scale: 2048.0 | grad norm: 84736.807 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6750/ 159576 | consumed samples: 236912 | elapsed time per iteration (ms): 18323.3 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.320579E+00 | loss scale: 2048.0 | grad norm: 68855.991 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 07:08:59] PULSE: tr8-104B is scheduled to start in 19:13:17 (at 2021-09-26T02:22:17) (1188168 on 'gpu_p13' partition) [2021-09-25 07:08:59] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 07:08:59] PULSE: tr8-104B is running for 2:41:58 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 6760/ 159576 | consumed samples: 237872 | elapsed time per iteration (ms): 18776.3 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.303013E+00 | loss scale: 2048.0 | grad norm: 69740.116 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6770/ 159576 | consumed samples: 238832 | elapsed time per iteration (ms): 18675.5 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.319376E+00 | loss scale: 2048.0 | grad norm: 83900.872 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6780/ 159576 | consumed samples: 239792 | elapsed time per iteration (ms): 18605.9 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.336406E+00 | loss scale: 2048.0 | grad norm: 62443.554 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6790/ 159576 | consumed samples: 240752 | elapsed time per iteration (ms): 18746.1 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.333478E+00 | loss scale: 2048.0 | grad norm: 73606.128 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6800/ 159576 | consumed samples: 241712 | elapsed time per iteration (ms): 18688.5 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.336754E+00 | loss scale: 2048.0 | grad norm: 96323.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6810/ 159576 | consumed samples: 242672 | elapsed time per iteration (ms): 18568.8 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.315503E+00 | loss scale: 2048.0 | grad norm: 65008.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6820/ 159576 | consumed samples: 243632 | elapsed time per iteration (ms): 18731.9 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.301308E+00 | loss scale: 2048.0 | grad norm: 70887.665 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6830/ 159576 | consumed samples: 244592 | elapsed time per iteration (ms): 18612.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.331754E+00 | loss scale: 2048.0 | grad norm: 78393.887 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6840/ 159576 | consumed samples: 245552 | elapsed time per iteration (ms): 18584.4 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.318947E+00 | loss scale: 4096.0 | grad norm: 175812.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6850/ 159576 | consumed samples: 246512 | elapsed time per iteration (ms): 18855.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.349559E+00 | loss scale: 4096.0 | grad norm: 150858.899 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6860/ 159576 | consumed samples: 247472 | elapsed time per iteration (ms): 18778.5 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.341676E+00 | loss scale: 4096.0 | grad norm: 374400.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6870/ 159576 | consumed samples: 248432 | elapsed time per iteration (ms): 18648.3 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.313033E+00 | loss scale: 4096.0 | grad norm: 153615.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6880/ 159576 | consumed samples: 249392 | elapsed time per iteration (ms): 18783.0 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.332200E+00 | loss scale: 4096.0 | grad norm: 135045.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6890/ 159576 | consumed samples: 250352 | elapsed time per iteration (ms): 18757.2 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.370442E+00 | loss scale: 4096.0 | grad norm: 140003.151 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6900/ 159576 | consumed samples: 251312 | elapsed time per iteration (ms): 18547.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.426891E+00 | loss scale: 4096.0 | grad norm: 166603.752 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6910/ 159576 | consumed samples: 252272 | elapsed time per iteration (ms): 18775.5 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.383529E+00 | loss scale: 4096.0 | grad norm: 161102.692 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6920/ 159576 | consumed samples: 253232 | elapsed time per iteration (ms): 18674.9 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.362777E+00 | loss scale: 4096.0 | grad norm: 135239.756 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6930/ 159576 | consumed samples: 254192 | elapsed time per iteration (ms): 18723.1 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.348313E+00 | loss scale: 4096.0 | grad norm: 180298.634 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6940/ 159576 | consumed samples: 255152 | elapsed time per iteration (ms): 18629.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.304693E+00 | loss scale: 4096.0 | grad norm: 155481.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6950/ 159576 | consumed samples: 256112 | elapsed time per iteration (ms): 18736.2 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.335081E+00 | loss scale: 4096.0 | grad norm: 170157.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 08:09:15] PULSE: tr8-104B is scheduled to start in 18:13:01 (at 2021-09-26T02:22:17) (1188168 on 'gpu_p13' partition) [2021-09-25 08:09:15] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 08:09:15] PULSE: tr8-104B is running for 3:42:14 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 6960/ 159576 | consumed samples: 257072 | elapsed time per iteration (ms): 18679.3 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.350162E+00 | loss scale: 4096.0 | grad norm: 146048.789 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6970/ 159576 | consumed samples: 258032 | elapsed time per iteration (ms): 17405.9 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.358824E+00 | loss scale: 2048.0 | grad norm: 83822.155 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6980/ 159576 | consumed samples: 258992 | elapsed time per iteration (ms): 18714.5 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.327154E+00 | loss scale: 2048.0 | grad norm: 55012.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6990/ 159576 | consumed samples: 259952 | elapsed time per iteration (ms): 18649.4 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.344659E+00 | loss scale: 2048.0 | grad norm: 62132.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7000/ 159576 | consumed samples: 260912 | elapsed time per iteration (ms): 18706.1 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.444662E+00 | loss scale: 2048.0 | grad norm: 98258.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------ validation loss at iteration 7000 | lm loss value: 7.174200E+00 | lm loss PPL: 1.305315E+03 | ------------------------------------------------------------------------------------------------ iteration 7010/ 159576 | consumed samples: 261872 | elapsed time per iteration (ms): 19904.0 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 1.142026E+01 | loss scale: 2048.0 | grad norm: 219645.978 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7020/ 159576 | consumed samples: 262832 | elapsed time per iteration (ms): 18580.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 1.367010E+01 | loss scale: 2048.0 | grad norm: 223286.170 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 08:32:28] PULSE: tr8-104B is scheduled to start in 17:49:48 (at 2021-09-26T02:22:17) (1188168 on 'gpu_p13' partition) [2021-09-25 08:32:28] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 08:32:28] PULSE: tr8-104B is running for 4:05:27 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 7030/ 159576 | consumed samples: 263792 | elapsed time per iteration (ms): 18402.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 1.182180E+01 | loss scale: 2048.0 | grad norm: 19931.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7040/ 159576 | consumed samples: 264752 | elapsed time per iteration (ms): 18461.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 9.981701E+00 | loss scale: 2048.0 | grad norm: 205737.088 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7050/ 159576 | consumed samples: 265712 | elapsed time per iteration (ms): 18431.2 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 9.425107E+00 | loss scale: 2048.0 | grad norm: 195793.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7060/ 159576 | consumed samples: 266672 | elapsed time per iteration (ms): 18498.9 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 8.606621E+00 | loss scale: 2048.0 | grad norm: 50379.603 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7070/ 159576 | consumed samples: 267632 | elapsed time per iteration (ms): 18340.3 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 8.027315E+00 | loss scale: 2048.0 | grad norm: 37173.058 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7080/ 159576 | consumed samples: 268592 | elapsed time per iteration (ms): 18563.4 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.726066E+00 | loss scale: 2048.0 | grad norm: 22946.689 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7090/ 159576 | consumed samples: 269552 | elapsed time per iteration (ms): 18408.0 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.553810E+00 | loss scale: 2048.0 | grad norm: 16048.807 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7100/ 159576 | consumed samples: 270512 | elapsed time per iteration (ms): 18353.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.394469E+00 | loss scale: 2048.0 | grad norm: 10766.157 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 08:57:55] PULSE: tr8-104B is scheduled to start in 17:24:21 (at 2021-09-26T02:22:17) (1188168 on 'gpu_p13' partition) [2021-09-25 08:57:55] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 08:57:55] PULSE: tr8-104B is running for 4:30:54 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 7110/ 159576 | consumed samples: 271472 | elapsed time per iteration (ms): 18511.6 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.327065E+00 | loss scale: 2048.0 | grad norm: 25940.869 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7120/ 159576 | consumed samples: 272432 | elapsed time per iteration (ms): 18333.5 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.337917E+00 | loss scale: 2048.0 | grad norm: 18319.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7130/ 159576 | consumed samples: 273392 | elapsed time per iteration (ms): 18249.8 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.273988E+00 | loss scale: 2048.0 | grad norm: 14331.807 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7140/ 159576 | consumed samples: 274352 | elapsed time per iteration (ms): 18274.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.204887E+00 | loss scale: 2048.0 | grad norm: 21767.712 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 09:09:21] PULSE: tr8-104B is scheduled to start in 17:12:55 (at 2021-09-26T02:22:17) (1188168 on 'gpu_p13' partition) [2021-09-25 09:09:21] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 09:09:21] PULSE: tr8-104B is running for 4:42:20 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 7150/ 159576 | consumed samples: 275312 | elapsed time per iteration (ms): 18318.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.195872E+00 | loss scale: 2048.0 | grad norm: 14010.173 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7160/ 159576 | consumed samples: 276272 | elapsed time per iteration (ms): 18337.2 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.136990E+00 | loss scale: 2048.0 | grad norm: 23189.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7170/ 159576 | consumed samples: 277232 | elapsed time per iteration (ms): 18344.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.222323E+00 | loss scale: 2048.0 | grad norm: 22610.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7180/ 159576 | consumed samples: 278192 | elapsed time per iteration (ms): 18312.6 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.156533E+00 | loss scale: 2048.0 | grad norm: 12376.987 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7190/ 159576 | consumed samples: 279152 | elapsed time per iteration (ms): 18417.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.084262E+00 | loss scale: 2048.0 | grad norm: 38647.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7200/ 159576 | consumed samples: 280112 | elapsed time per iteration (ms): 18396.8 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.110893E+00 | loss scale: 2048.0 | grad norm: 21520.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7210/ 159576 | consumed samples: 281072 | elapsed time per iteration (ms): 18408.8 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.294872E+00 | loss scale: 2048.0 | grad norm: 77171.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7220/ 159576 | consumed samples: 282032 | elapsed time per iteration (ms): 18333.4 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.155109E+00 | loss scale: 2048.0 | grad norm: 16921.991 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7230/ 159576 | consumed samples: 282992 | elapsed time per iteration (ms): 18398.5 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.042103E+00 | loss scale: 2048.0 | grad norm: 13510.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7240/ 159576 | consumed samples: 284032 | elapsed time per iteration (ms): 19100.0 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.964984E+00 | loss scale: 2048.0 | grad norm: 11355.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7250/ 159576 | consumed samples: 285152 | elapsed time per iteration (ms): 19781.1 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 7.051522E+00 | loss scale: 2048.0 | grad norm: 14836.710 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7260/ 159576 | consumed samples: 286272 | elapsed time per iteration (ms): 19836.2 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 7.050404E+00 | loss scale: 2048.0 | grad norm: 32092.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7270/ 159576 | consumed samples: 287392 | elapsed time per iteration (ms): 19719.8 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 7.034865E+00 | loss scale: 2048.0 | grad norm: 25809.031 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7280/ 159576 | consumed samples: 288512 | elapsed time per iteration (ms): 19632.8 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 7.038512E+00 | loss scale: 2048.0 | grad norm: 19816.017 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7290/ 159576 | consumed samples: 289632 | elapsed time per iteration (ms): 19704.6 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 7.051814E+00 | loss scale: 2048.0 | grad norm: 13138.906 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7300/ 159576 | consumed samples: 290752 | elapsed time per iteration (ms): 19431.1 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.962708E+00 | loss scale: 2048.0 | grad norm: 15505.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7310/ 159576 | consumed samples: 291872 | elapsed time per iteration (ms): 19625.1 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 7.068867E+00 | loss scale: 2048.0 | grad norm: 26542.834 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7320/ 159576 | consumed samples: 292992 | elapsed time per iteration (ms): 19705.6 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 7.131171E+00 | loss scale: 2048.0 | grad norm: 59185.721 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7330/ 159576 | consumed samples: 294112 | elapsed time per iteration (ms): 19592.0 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 7.030576E+00 | loss scale: 2048.0 | grad norm: 32033.660 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 10:09:39] PULSE: tr8-104B is scheduled to start in 17:07:05 (at 2021-09-26T03:16:45) (1188168 on 'gpu_p13' partition) [2021-09-25 10:09:39] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 10:09:39] PULSE: tr8-104B is running for 5:42:38 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 7340/ 159576 | consumed samples: 295232 | elapsed time per iteration (ms): 19566.4 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.981178E+00 | loss scale: 2048.0 | grad norm: 29317.971 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7350/ 159576 | consumed samples: 296352 | elapsed time per iteration (ms): 19494.3 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.969751E+00 | loss scale: 2048.0 | grad norm: 20774.916 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7360/ 159576 | consumed samples: 297472 | elapsed time per iteration (ms): 19789.2 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.939532E+00 | loss scale: 2048.0 | grad norm: 22939.531 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7370/ 159576 | consumed samples: 298592 | elapsed time per iteration (ms): 19854.7 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.888672E+00 | loss scale: 2048.0 | grad norm: 30762.881 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7380/ 159576 | consumed samples: 299712 | elapsed time per iteration (ms): 19888.4 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.906486E+00 | loss scale: 2048.0 | grad norm: 18438.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7390/ 159576 | consumed samples: 300832 | elapsed time per iteration (ms): 19703.2 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.877617E+00 | loss scale: 2048.0 | grad norm: 15185.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7400/ 159576 | consumed samples: 301952 | elapsed time per iteration (ms): 19654.2 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.854189E+00 | loss scale: 2048.0 | grad norm: 15960.831 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7410/ 159576 | consumed samples: 303072 | elapsed time per iteration (ms): 19528.4 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.894382E+00 | loss scale: 2048.0 | grad norm: 12842.484 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7420/ 159576 | consumed samples: 304192 | elapsed time per iteration (ms): 19701.7 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.860787E+00 | loss scale: 2048.0 | grad norm: 15167.024 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7430/ 159576 | consumed samples: 305312 | elapsed time per iteration (ms): 19702.1 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.859363E+00 | loss scale: 2048.0 | grad norm: 23062.497 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7440/ 159576 | consumed samples: 306432 | elapsed time per iteration (ms): 19933.7 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.860333E+00 | loss scale: 2048.0 | grad norm: 32840.662 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7450/ 159576 | consumed samples: 307552 | elapsed time per iteration (ms): 19857.9 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.824039E+00 | loss scale: 2048.0 | grad norm: 14512.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7460/ 159576 | consumed samples: 308672 | elapsed time per iteration (ms): 19438.9 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.828743E+00 | loss scale: 2048.0 | grad norm: 22065.697 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7470/ 159576 | consumed samples: 309792 | elapsed time per iteration (ms): 19647.3 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.799754E+00 | loss scale: 4096.0 | grad norm: 49640.058 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7480/ 159576 | consumed samples: 310912 | elapsed time per iteration (ms): 19818.5 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.815539E+00 | loss scale: 4096.0 | grad norm: 22148.104 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7490/ 159576 | consumed samples: 312032 | elapsed time per iteration (ms): 19788.8 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.894387E+00 | loss scale: 4096.0 | grad norm: 36912.117 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7500/ 159576 | consumed samples: 313152 | elapsed time per iteration (ms): 19799.3 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.841101E+00 | loss scale: 4096.0 | grad norm: 23983.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 7500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-25 11:03:46,249] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step7500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 7500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 18021.67 iteration 7510/ 159576 | consumed samples: 314272 | elapsed time per iteration (ms): 21444.7 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.821138E+00 | loss scale: 4096.0 | grad norm: 27340.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 11:09:42] PULSE: tr8-104B is scheduled to start in 17:10:43 (at 2021-09-26T04:20:26) (1188168 on 'gpu_p13' partition) [2021-09-25 11:09:42] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 11:09:42] PULSE: tr8-104B is running for 6:42:41 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 7520/ 159576 | consumed samples: 315392 | elapsed time per iteration (ms): 19669.6 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.839085E+00 | loss scale: 4096.0 | grad norm: 27168.782 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7530/ 159576 | consumed samples: 316512 | elapsed time per iteration (ms): 19673.9 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.866766E+00 | loss scale: 4096.0 | grad norm: 35661.716 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7540/ 159576 | consumed samples: 317632 | elapsed time per iteration (ms): 19547.7 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.895227E+00 | loss scale: 4096.0 | grad norm: 30950.102 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7550/ 159576 | consumed samples: 318752 | elapsed time per iteration (ms): 19728.4 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.974333E+00 | loss scale: 4096.0 | grad norm: 58146.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7560/ 159576 | consumed samples: 319872 | elapsed time per iteration (ms): 19670.9 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.993269E+00 | loss scale: 4096.0 | grad norm: 59358.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7570/ 159576 | consumed samples: 320992 | elapsed time per iteration (ms): 19932.4 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 7.018776E+00 | loss scale: 4096.0 | grad norm: 26693.574 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7580/ 159576 | consumed samples: 322112 | elapsed time per iteration (ms): 19801.6 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.954316E+00 | loss scale: 4096.0 | grad norm: 56910.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7590/ 159576 | consumed samples: 323232 | elapsed time per iteration (ms): 19757.6 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 7.019042E+00 | loss scale: 4096.0 | grad norm: 31511.156 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7600/ 159576 | consumed samples: 324352 | elapsed time per iteration (ms): 19717.1 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 7.002568E+00 | loss scale: 4096.0 | grad norm: 35214.039 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7610/ 159576 | consumed samples: 325472 | elapsed time per iteration (ms): 19801.0 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.968073E+00 | loss scale: 4096.0 | grad norm: 40886.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7620/ 159576 | consumed samples: 326592 | elapsed time per iteration (ms): 19491.3 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.959355E+00 | loss scale: 4096.0 | grad norm: 37865.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7630/ 159576 | consumed samples: 327712 | elapsed time per iteration (ms): 19606.0 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.927076E+00 | loss scale: 4096.0 | grad norm: 32908.139 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7640/ 159576 | consumed samples: 328832 | elapsed time per iteration (ms): 19669.6 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 7.079063E+00 | loss scale: 4096.0 | grad norm: 43561.929 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7650/ 159576 | consumed samples: 329952 | elapsed time per iteration (ms): 19813.3 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.977676E+00 | loss scale: 4096.0 | grad norm: 33954.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7660/ 159576 | consumed samples: 331120 | elapsed time per iteration (ms): 20182.2 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.071407E+00 | loss scale: 4096.0 | grad norm: 139629.093 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7670/ 159576 | consumed samples: 332400 | elapsed time per iteration (ms): 20921.2 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.133433E+00 | loss scale: 4096.0 | grad norm: 151598.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7680/ 159576 | consumed samples: 333680 | elapsed time per iteration (ms): 20923.7 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.093058E+00 | loss scale: 4096.0 | grad norm: 75854.068 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7690/ 159576 | consumed samples: 334960 | elapsed time per iteration (ms): 20468.2 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.040206E+00 | loss scale: 4096.0 | grad norm: 68735.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 12:10:01] PULSE: tr8-104B is scheduled to start in 18:54:29 (at 2021-09-26T07:04:31) (1188168 on 'gpu_p13' partition) [2021-09-25 12:10:01] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 12:10:01] PULSE: tr8-104B is running for 7:43:00 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 7700/ 159576 | consumed samples: 336240 | elapsed time per iteration (ms): 20712.9 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.991071E+00 | loss scale: 4096.0 | grad norm: 49058.974 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7710/ 159576 | consumed samples: 337520 | elapsed time per iteration (ms): 20803.8 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.999660E+00 | loss scale: 4096.0 | grad norm: 50810.796 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7720/ 159576 | consumed samples: 338800 | elapsed time per iteration (ms): 21027.6 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.148920E+00 | loss scale: 4096.0 | grad norm: 34526.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7730/ 159576 | consumed samples: 340080 | elapsed time per iteration (ms): 20621.1 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.952879E+00 | loss scale: 4096.0 | grad norm: 46587.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7740/ 159576 | consumed samples: 341360 | elapsed time per iteration (ms): 20787.7 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.077150E+00 | loss scale: 4096.0 | grad norm: 53834.886 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7750/ 159576 | consumed samples: 342640 | elapsed time per iteration (ms): 20790.5 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.024051E+00 | loss scale: 4096.0 | grad norm: 108296.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7760/ 159576 | consumed samples: 343920 | elapsed time per iteration (ms): 20756.3 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.185934E+00 | loss scale: 4096.0 | grad norm: 40243.918 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7770/ 159576 | consumed samples: 345200 | elapsed time per iteration (ms): 20678.9 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.155985E+00 | loss scale: 4096.0 | grad norm: 45818.733 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7780/ 159576 | consumed samples: 346480 | elapsed time per iteration (ms): 20656.6 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.028696E+00 | loss scale: 4096.0 | grad norm: 54814.681 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7790/ 159576 | consumed samples: 347760 | elapsed time per iteration (ms): 20773.2 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.962093E+00 | loss scale: 4096.0 | grad norm: 57105.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7800/ 159576 | consumed samples: 349040 | elapsed time per iteration (ms): 20735.7 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.054767E+00 | loss scale: 4096.0 | grad norm: 74767.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7810/ 159576 | consumed samples: 350320 | elapsed time per iteration (ms): 20748.9 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.948767E+00 | loss scale: 4096.0 | grad norm: 103822.696 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7820/ 159576 | consumed samples: 351600 | elapsed time per iteration (ms): 20609.0 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.995116E+00 | loss scale: 4096.0 | grad norm: 70594.913 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7830/ 159576 | consumed samples: 352880 | elapsed time per iteration (ms): 20891.2 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.140380E+00 | loss scale: 4096.0 | grad norm: 50257.684 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7840/ 159576 | consumed samples: 354160 | elapsed time per iteration (ms): 20736.5 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.051595E+00 | loss scale: 4096.0 | grad norm: 62967.110 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7850/ 159576 | consumed samples: 355440 | elapsed time per iteration (ms): 20790.1 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.921895E+00 | loss scale: 4096.0 | grad norm: 104168.914 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7860/ 159576 | consumed samples: 356720 | elapsed time per iteration (ms): 20774.7 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.071528E+00 | loss scale: 4096.0 | grad norm: 193610.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7870/ 159576 | consumed samples: 358000 | elapsed time per iteration (ms): 20837.0 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.086633E+00 | loss scale: 4096.0 | grad norm: 56330.990 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 13:10:06] PULSE: tr8-104B is scheduled to start in 17:54:24 (at 2021-09-26T07:04:31) (1188168 on 'gpu_p13' partition) [2021-09-25 13:10:06] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 13:10:06] PULSE: tr8-104B is running for 8:43:05 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 7880/ 159576 | consumed samples: 359280 | elapsed time per iteration (ms): 20746.8 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.156522E+00 | loss scale: 4096.0 | grad norm: 137295.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7890/ 159576 | consumed samples: 360560 | elapsed time per iteration (ms): 20983.7 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.996352E+00 | loss scale: 4096.0 | grad norm: 67763.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7900/ 159576 | consumed samples: 361840 | elapsed time per iteration (ms): 20640.0 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.985654E+00 | loss scale: 4096.0 | grad norm: 113013.123 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7910/ 159576 | consumed samples: 363120 | elapsed time per iteration (ms): 20742.0 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.976338E+00 | loss scale: 4096.0 | grad norm: 73140.648 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7920/ 159576 | consumed samples: 364400 | elapsed time per iteration (ms): 20679.4 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.917073E+00 | loss scale: 4096.0 | grad norm: 83861.566 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7930/ 159576 | consumed samples: 365680 | elapsed time per iteration (ms): 20531.8 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.971965E+00 | loss scale: 4096.0 | grad norm: 57978.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7940/ 159576 | consumed samples: 366960 | elapsed time per iteration (ms): 20446.7 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.117603E+00 | loss scale: 4096.0 | grad norm: 218144.909 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7950/ 159576 | consumed samples: 368240 | elapsed time per iteration (ms): 20823.5 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.029739E+00 | loss scale: 4096.0 | grad norm: 46987.640 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7960/ 159576 | consumed samples: 369520 | elapsed time per iteration (ms): 20775.8 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.972835E+00 | loss scale: 4096.0 | grad norm: 59193.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7970/ 159576 | consumed samples: 370800 | elapsed time per iteration (ms): 20508.8 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.890491E+00 | loss scale: 8192.0 | grad norm: 102786.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7980/ 159576 | consumed samples: 372080 | elapsed time per iteration (ms): 20983.1 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.927078E+00 | loss scale: 8192.0 | grad norm: 117997.551 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7990/ 159576 | consumed samples: 373360 | elapsed time per iteration (ms): 20495.7 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.823578E+00 | loss scale: 8192.0 | grad norm: 123947.033 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 13:53:58,625] [INFO] [logging.py:68:log_dist] [Rank 0] step=8000, skipped=17, lr=[5.999979430007177e-05, 5.999979430007177e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 8000 loss: 6.8207 iter time (s): 0.010 samples/sec: 13060.948 iteration 8000/ 159576 | consumed samples: 374640 | elapsed time per iteration (ms): 20659.2 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.884979E+00 | loss scale: 8192.0 | grad norm: 131468.178 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------ validation loss at iteration 8000 | lm loss value: 6.791678E+00 | lm loss PPL: 8.904064E+02 | ------------------------------------------------------------------------------------------------ iteration 8010/ 159576 | consumed samples: 375920 | elapsed time per iteration (ms): 22008.9 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.826038E+00 | loss scale: 8192.0 | grad norm: 154245.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8020/ 159576 | consumed samples: 377200 | elapsed time per iteration (ms): 20587.9 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.870419E+00 | loss scale: 8192.0 | grad norm: 129858.542 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8030/ 159576 | consumed samples: 378544 | elapsed time per iteration (ms): 21288.4 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.928481E+00 | loss scale: 8192.0 | grad norm: 226677.481 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8040/ 159576 | consumed samples: 379984 | elapsed time per iteration (ms): 21881.6 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.896291E+00 | loss scale: 8192.0 | grad norm: 205623.823 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 14:10:08] PULSE: tr8-104B is scheduled to start in 17:26:04 (at 2021-09-26T07:36:13) (1188168 on 'gpu_p13' partition) [2021-09-25 14:10:08] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 14:10:08] PULSE: tr8-104B is running for 9:43:07 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 8050/ 159576 | consumed samples: 381424 | elapsed time per iteration (ms): 21696.5 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.873873E+00 | loss scale: 8192.0 | grad norm: 146153.031 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8060/ 159576 | consumed samples: 382864 | elapsed time per iteration (ms): 21810.7 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.853185E+00 | loss scale: 8192.0 | grad norm: 101607.158 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8070/ 159576 | consumed samples: 384304 | elapsed time per iteration (ms): 21802.4 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.850246E+00 | loss scale: 8192.0 | grad norm: 139070.087 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8080/ 159576 | consumed samples: 385744 | elapsed time per iteration (ms): 21831.7 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.848817E+00 | loss scale: 8192.0 | grad norm: 129639.082 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8090/ 159576 | consumed samples: 387184 | elapsed time per iteration (ms): 21715.3 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.856639E+00 | loss scale: 8192.0 | grad norm: 200364.806 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8100/ 159576 | consumed samples: 388624 | elapsed time per iteration (ms): 21801.4 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.869398E+00 | loss scale: 8192.0 | grad norm: 141893.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8110/ 159576 | consumed samples: 390064 | elapsed time per iteration (ms): 21693.5 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.834469E+00 | loss scale: 8192.0 | grad norm: 133792.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8120/ 159576 | consumed samples: 391504 | elapsed time per iteration (ms): 21798.3 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.845126E+00 | loss scale: 8192.0 | grad norm: 196465.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8130/ 159576 | consumed samples: 392944 | elapsed time per iteration (ms): 21718.4 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.864041E+00 | loss scale: 8192.0 | grad norm: 234002.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8140/ 159576 | consumed samples: 394384 | elapsed time per iteration (ms): 20974.7 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.866895E+00 | loss scale: 8192.0 | grad norm: 214792.051 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8150/ 159576 | consumed samples: 395824 | elapsed time per iteration (ms): 20962.3 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.949483E+00 | loss scale: 4096.0 | grad norm: 129105.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8160/ 159576 | consumed samples: 397264 | elapsed time per iteration (ms): 21839.6 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.982524E+00 | loss scale: 4096.0 | grad norm: 104094.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8170/ 159576 | consumed samples: 398704 | elapsed time per iteration (ms): 21626.3 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.968035E+00 | loss scale: 4096.0 | grad norm: 85705.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8180/ 159576 | consumed samples: 400144 | elapsed time per iteration (ms): 21733.4 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.983526E+00 | loss scale: 4096.0 | grad norm: 140563.515 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8190/ 159576 | consumed samples: 401584 | elapsed time per iteration (ms): 21768.5 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 7.016048E+00 | loss scale: 4096.0 | grad norm: 72531.033 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8200/ 159576 | consumed samples: 403024 | elapsed time per iteration (ms): 21929.8 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.996774E+00 | loss scale: 4096.0 | grad norm: 128628.095 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8210/ 159576 | consumed samples: 404464 | elapsed time per iteration (ms): 21876.8 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.954953E+00 | loss scale: 4096.0 | grad norm: 114237.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 15:10:12] PULSE: tr8-104B is scheduled to start in 20:25:18 (at 2021-09-26T11:35:31) (1188168 on 'gpu_p13' partition) [2021-09-25 15:10:12] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 15:10:12] PULSE: tr8-104B is running for 10:43:11 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 8220/ 159576 | consumed samples: 405904 | elapsed time per iteration (ms): 21992.9 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.927856E+00 | loss scale: 4096.0 | grad norm: 191859.936 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8230/ 159576 | consumed samples: 407344 | elapsed time per iteration (ms): 21845.4 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.915263E+00 | loss scale: 4096.0 | grad norm: 136325.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8240/ 159576 | consumed samples: 408784 | elapsed time per iteration (ms): 21179.2 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.864025E+00 | loss scale: 2048.0 | grad norm: 118355.574 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8250/ 159576 | consumed samples: 410224 | elapsed time per iteration (ms): 21688.2 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.873029E+00 | loss scale: 2048.0 | grad norm: 72612.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8260/ 159576 | consumed samples: 411664 | elapsed time per iteration (ms): 21621.0 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.963725E+00 | loss scale: 2048.0 | grad norm: 77677.833 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8270/ 159576 | consumed samples: 413104 | elapsed time per iteration (ms): 21832.0 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.939199E+00 | loss scale: 2048.0 | grad norm: 80021.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8280/ 159576 | consumed samples: 414544 | elapsed time per iteration (ms): 21967.3 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.919482E+00 | loss scale: 2048.0 | grad norm: 58905.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8290/ 159576 | consumed samples: 415984 | elapsed time per iteration (ms): 21671.6 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.919662E+00 | loss scale: 2048.0 | grad norm: 52571.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8300/ 159576 | consumed samples: 417424 | elapsed time per iteration (ms): 21755.6 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 7.024297E+00 | loss scale: 2048.0 | grad norm: 77079.083 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8310/ 159576 | consumed samples: 418864 | elapsed time per iteration (ms): 21909.8 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 7.234490E+00 | loss scale: 2048.0 | grad norm: 102216.544 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8320/ 159576 | consumed samples: 420304 | elapsed time per iteration (ms): 21566.6 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 7.228243E+00 | loss scale: 2048.0 | grad norm: 88135.536 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8330/ 159576 | consumed samples: 421744 | elapsed time per iteration (ms): 22069.0 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 7.068048E+00 | loss scale: 2048.0 | grad norm: 65341.009 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8340/ 159576 | consumed samples: 423184 | elapsed time per iteration (ms): 21682.1 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 7.049673E+00 | loss scale: 2048.0 | grad norm: 45586.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8350/ 159576 | consumed samples: 424624 | elapsed time per iteration (ms): 21918.1 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 7.033588E+00 | loss scale: 2048.0 | grad norm: 60230.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8360/ 159576 | consumed samples: 426160 | elapsed time per iteration (ms): 22474.7 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.032515E+00 | loss scale: 2048.0 | grad norm: 55714.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8370/ 159576 | consumed samples: 427760 | elapsed time per iteration (ms): 22723.0 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.051062E+00 | loss scale: 2048.0 | grad norm: 68784.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 16:10:22] PULSE: tr8-104B is scheduled to start in 19:16:12 (at 2021-09-26T11:26:35) (1188168 on 'gpu_p13' partition) [2021-09-25 16:10:22] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 16:10:22] PULSE: tr8-104B is running for 11:43:21 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 8380/ 159576 | consumed samples: 429360 | elapsed time per iteration (ms): 22974.1 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.025337E+00 | loss scale: 2048.0 | grad norm: 89725.468 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8390/ 159576 | consumed samples: 430960 | elapsed time per iteration (ms): 22266.9 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.010270E+00 | loss scale: 1024.0 | grad norm: 33629.138 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8400/ 159576 | consumed samples: 432560 | elapsed time per iteration (ms): 22964.2 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.020833E+00 | loss scale: 1024.0 | grad norm: 46812.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8410/ 159576 | consumed samples: 434160 | elapsed time per iteration (ms): 22923.5 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.044554E+00 | loss scale: 1024.0 | grad norm: 55335.802 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8420/ 159576 | consumed samples: 435760 | elapsed time per iteration (ms): 22690.3 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.074860E+00 | loss scale: 1024.0 | grad norm: 27018.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8430/ 159576 | consumed samples: 437360 | elapsed time per iteration (ms): 22997.6 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.108445E+00 | loss scale: 1024.0 | grad norm: 95058.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8440/ 159576 | consumed samples: 438960 | elapsed time per iteration (ms): 22696.4 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.128921E+00 | loss scale: 1024.0 | grad norm: 44470.175 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8450/ 159576 | consumed samples: 440560 | elapsed time per iteration (ms): 22728.4 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.037349E+00 | loss scale: 1024.0 | grad norm: 32995.810 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8460/ 159576 | consumed samples: 442160 | elapsed time per iteration (ms): 22856.0 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.064864E+00 | loss scale: 1024.0 | grad norm: 23093.772 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8470/ 159576 | consumed samples: 443760 | elapsed time per iteration (ms): 22824.5 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.057752E+00 | loss scale: 1024.0 | grad norm: 34580.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8480/ 159576 | consumed samples: 445360 | elapsed time per iteration (ms): 22939.9 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.111783E+00 | loss scale: 1024.0 | grad norm: 30415.135 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8490/ 159576 | consumed samples: 446960 | elapsed time per iteration (ms): 22647.3 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.077787E+00 | loss scale: 1024.0 | grad norm: 44228.518 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8500/ 159576 | consumed samples: 448560 | elapsed time per iteration (ms): 22870.1 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.017307E+00 | loss scale: 1024.0 | grad norm: 31106.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 17:00:02] PULSE: tr8-104B is scheduled to start in 18:26:32 (at 2021-09-26T11:26:35) (1188168 on 'gpu_p13' partition) [2021-09-25 17:00:02] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1185639_[2-10%1] on 'gpu_p13' partition) [2021-09-25 17:00:02] PULSE: tr8-104B is running for 12:33:01 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 8510/ 159576 | consumed samples: 450160 | elapsed time per iteration (ms): 22836.1 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.033496E+00 | loss scale: 1024.0 | grad norm: 84589.712 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8520/ 159576 | consumed samples: 451760 | elapsed time per iteration (ms): 22678.6 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.034415E+00 | loss scale: 1024.0 | grad norm: 45889.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8530/ 159576 | consumed samples: 453360 | elapsed time per iteration (ms): 22820.3 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.022775E+00 | loss scale: 1024.0 | grad norm: 46421.613 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 17:10:31] PULSE: tr8-104B is scheduled to start in 18:16:03 (at 2021-09-26T11:26:35) (1188168 on 'gpu_p13' partition) [2021-09-25 17:10:31] PULSE: tr8-104B is running for 12:43:30 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 8540/ 159576 | consumed samples: 454960 | elapsed time per iteration (ms): 22803.2 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.015056E+00 | loss scale: 1024.0 | grad norm: 49138.667 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8550/ 159576 | consumed samples: 456560 | elapsed time per iteration (ms): 22969.4 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.037695E+00 | loss scale: 1024.0 | grad norm: 72675.159 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8560/ 159576 | consumed samples: 458160 | elapsed time per iteration (ms): 22624.1 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.040105E+00 | loss scale: 1024.0 | grad norm: 55417.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8570/ 159576 | consumed samples: 459760 | elapsed time per iteration (ms): 22663.1 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.066528E+00 | loss scale: 1024.0 | grad norm: 48492.969 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 17:26:58] PULSE: tr8-104B is scheduled to start in 17:59:36 (at 2021-09-26T11:26:35) (1188168 on 'gpu_p13' partition) [2021-09-25 17:26:58] PULSE: tr8-104B is running for 12:59:57 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 8580/ 159576 | consumed samples: 461360 | elapsed time per iteration (ms): 22688.8 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.087028E+00 | loss scale: 1024.0 | grad norm: 46974.842 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8590/ 159576 | consumed samples: 462960 | elapsed time per iteration (ms): 22699.4 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.089204E+00 | loss scale: 1024.0 | grad norm: 44702.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8600/ 159576 | consumed samples: 464560 | elapsed time per iteration (ms): 22777.7 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.149306E+00 | loss scale: 1024.0 | grad norm: 261339.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8610/ 159576 | consumed samples: 466160 | elapsed time per iteration (ms): 22975.5 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.167276E+00 | loss scale: 1024.0 | grad norm: 105455.551 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8620/ 159576 | consumed samples: 467760 | elapsed time per iteration (ms): 23048.5 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.078442E+00 | loss scale: 1024.0 | grad norm: 84212.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8630/ 159576 | consumed samples: 469360 | elapsed time per iteration (ms): 22799.5 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.081234E+00 | loss scale: 1024.0 | grad norm: 52121.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8640/ 159576 | consumed samples: 470960 | elapsed time per iteration (ms): 22720.5 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.109283E+00 | loss scale: 1024.0 | grad norm: 48651.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8650/ 159576 | consumed samples: 472560 | elapsed time per iteration (ms): 22695.2 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.118199E+00 | loss scale: 1024.0 | grad norm: 26046.891 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8660/ 159576 | consumed samples: 474320 | elapsed time per iteration (ms): 23933.5 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.064212E+00 | loss scale: 1024.0 | grad norm: 40523.058 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8670/ 159576 | consumed samples: 476080 | elapsed time per iteration (ms): 23798.1 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.051229E+00 | loss scale: 1024.0 | grad norm: 28160.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8680/ 159576 | consumed samples: 477840 | elapsed time per iteration (ms): 23923.9 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.036906E+00 | loss scale: 1024.0 | grad norm: 51047.866 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8690/ 159576 | consumed samples: 479600 | elapsed time per iteration (ms): 23651.1 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.073657E+00 | loss scale: 1024.0 | grad norm: 141610.865 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 18:10:35] PULSE: tr8-104B is scheduled to start in 17:15:59 (at 2021-09-26T11:26:35) (1188168 on 'gpu_p13' partition) [2021-09-25 18:10:35] PULSE: tr8-104B is running for 13:43:34 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 8700/ 159576 | consumed samples: 481360 | elapsed time per iteration (ms): 23943.4 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.071510E+00 | loss scale: 1024.0 | grad norm: 24381.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8710/ 159576 | consumed samples: 483120 | elapsed time per iteration (ms): 23910.3 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.190697E+00 | loss scale: 1024.0 | grad norm: 41525.807 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8720/ 159576 | consumed samples: 484880 | elapsed time per iteration (ms): 23923.5 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.332158E+00 | loss scale: 1024.0 | grad norm: 23580.074 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8730/ 159576 | consumed samples: 486640 | elapsed time per iteration (ms): 23664.9 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.250137E+00 | loss scale: 1024.0 | grad norm: 33934.114 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8740/ 159576 | consumed samples: 488400 | elapsed time per iteration (ms): 24002.8 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.134158E+00 | loss scale: 1024.0 | grad norm: 18917.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8750/ 159576 | consumed samples: 490160 | elapsed time per iteration (ms): 23812.9 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.133132E+00 | loss scale: 1024.0 | grad norm: 24524.875 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8760/ 159576 | consumed samples: 491920 | elapsed time per iteration (ms): 24164.0 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.089709E+00 | loss scale: 1024.0 | grad norm: 18466.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8770/ 159576 | consumed samples: 493680 | elapsed time per iteration (ms): 23763.0 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.075866E+00 | loss scale: 1024.0 | grad norm: 21160.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8780/ 159576 | consumed samples: 495440 | elapsed time per iteration (ms): 23757.0 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.105405E+00 | loss scale: 1024.0 | grad norm: 21012.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8790/ 159576 | consumed samples: 497200 | elapsed time per iteration (ms): 23726.0 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.119524E+00 | loss scale: 1024.0 | grad norm: 19184.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 18:51:17] PULSE: tr8-104B is scheduled to start in 19:55:07 (at 2021-09-26T14:46:25) (1188168 on 'gpu_p13' partition) [2021-09-25 18:51:17] PULSE: tr8-104B is running for 14:24:16 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 8800/ 159576 | consumed samples: 498960 | elapsed time per iteration (ms): 23872.5 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.150304E+00 | loss scale: 1024.0 | grad norm: 20582.002 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8810/ 159576 | consumed samples: 500720 | elapsed time per iteration (ms): 23674.3 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.121466E+00 | loss scale: 1024.0 | grad norm: 26026.638 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8820/ 159576 | consumed samples: 502480 | elapsed time per iteration (ms): 23655.3 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.227619E+00 | loss scale: 1024.0 | grad norm: 19493.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8830/ 159576 | consumed samples: 504240 | elapsed time per iteration (ms): 24040.7 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.202127E+00 | loss scale: 1024.0 | grad norm: 21130.889 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8840/ 159576 | consumed samples: 506000 | elapsed time per iteration (ms): 23751.6 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.102602E+00 | loss scale: 1024.0 | grad norm: 15258.781 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 19:10:38] PULSE: tr8-104B is scheduled to start in 19:35:46 (at 2021-09-26T14:46:25) (1188168 on 'gpu_p13' partition) [2021-09-25 19:10:38] PULSE: tr8-104B is running for 14:43:37 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 8850/ 159576 | consumed samples: 507760 | elapsed time per iteration (ms): 23681.3 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.106478E+00 | loss scale: 1024.0 | grad norm: 15650.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8860/ 159576 | consumed samples: 509520 | elapsed time per iteration (ms): 23830.0 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.077826E+00 | loss scale: 1024.0 | grad norm: 13271.961 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8870/ 159576 | consumed samples: 511280 | elapsed time per iteration (ms): 23830.3 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.083195E+00 | loss scale: 1024.0 | grad norm: 13942.816 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8880/ 159576 | consumed samples: 513040 | elapsed time per iteration (ms): 23893.7 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.101151E+00 | loss scale: 1024.0 | grad norm: 17666.067 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8890/ 159576 | consumed samples: 514800 | elapsed time per iteration (ms): 23733.4 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.130984E+00 | loss scale: 2048.0 | grad norm: 41179.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8900/ 159576 | consumed samples: 516560 | elapsed time per iteration (ms): 23693.0 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.084023E+00 | loss scale: 2048.0 | grad norm: 32703.102 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8910/ 159576 | consumed samples: 518320 | elapsed time per iteration (ms): 23793.1 | learning rate: 6.000E-05 | global batch size: 176 | lm loss: 7.094463E+00 | loss scale: 2048.0 | grad norm: 46954.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8920/ 159576 | consumed samples: 520112 | elapsed time per iteration (ms): 23988.6 | learning rate: 6.000E-05 | global batch size: 192 | lm loss: 7.094890E+00 | loss scale: 2048.0 | grad norm: 20910.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8930/ 159576 | consumed samples: 522032 | elapsed time per iteration (ms): 24780.5 | learning rate: 6.000E-05 | global batch size: 192 | lm loss: 7.112840E+00 | loss scale: 2048.0 | grad norm: 23723.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8940/ 159576 | consumed samples: 523952 | elapsed time per iteration (ms): 24880.9 | learning rate: 6.000E-05 | global batch size: 192 | lm loss: 7.157214E+00 | loss scale: 2048.0 | grad norm: 35769.072 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8950/ 159576 | consumed samples: 525872 | elapsed time per iteration (ms): 24820.3 | learning rate: 6.000E-05 | global batch size: 192 | lm loss: 7.212303E+00 | loss scale: 2048.0 | grad norm: 20241.796 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8960/ 159576 | consumed samples: 527792 | elapsed time per iteration (ms): 24706.7 | learning rate: 6.000E-05 | global batch size: 192 | lm loss: 7.215181E+00 | loss scale: 2048.0 | grad norm: 48969.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8970/ 159576 | consumed samples: 529712 | elapsed time per iteration (ms): 23528.3 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1024.0 | grad norm: 156762.139 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8980/ 159576 | consumed samples: 531632 | elapsed time per iteration (ms): 18302.5 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 2.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8990/ 159576 | consumed samples: 533552 | elapsed time per iteration (ms): 17645.0 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 20:10:52] PULSE: tr8-104B is scheduled to start in 18:35:32 (at 2021-09-26T14:46:25) (1188168 on 'gpu_p13' partition) [2021-09-25 20:10:52] PULSE: tr8-104B is running for 15:43:51 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 9000/ 159576 | consumed samples: 535472 | elapsed time per iteration (ms): 17316.3 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------ validation loss at iteration 9000 | lm loss value: 7.256732E+00 | lm loss PPL: 1.417617E+03 | ------------------------------------------------------------------------------------------------ saving checkpoint at iteration 9000 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-25 20:11:32,719] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step9000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 9000 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 17709.49 iteration 9010/ 159576 | consumed samples: 537392 | elapsed time per iteration (ms): 21623.6 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9020/ 159576 | consumed samples: 539312 | elapsed time per iteration (ms): 17559.0 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9030/ 159576 | consumed samples: 541232 | elapsed time per iteration (ms): 17827.7 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9040/ 159576 | consumed samples: 543152 | elapsed time per iteration (ms): 17458.2 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9050/ 159576 | consumed samples: 545072 | elapsed time per iteration (ms): 17470.7 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9060/ 159576 | consumed samples: 546992 | elapsed time per iteration (ms): 17813.0 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9070/ 159576 | consumed samples: 548912 | elapsed time per iteration (ms): 17646.8 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9080/ 159576 | consumed samples: 550832 | elapsed time per iteration (ms): 17634.4 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9090/ 159576 | consumed samples: 552752 | elapsed time per iteration (ms): 17734.2 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9100/ 159576 | consumed samples: 554672 | elapsed time per iteration (ms): 17470.3 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9110/ 159576 | consumed samples: 556592 | elapsed time per iteration (ms): 17443.8 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9120/ 159576 | consumed samples: 558512 | elapsed time per iteration (ms): 17456.2 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9130/ 159576 | consumed samples: 560432 | elapsed time per iteration (ms): 17374.7 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9140/ 159576 | consumed samples: 562352 | elapsed time per iteration (ms): 17541.4 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9150/ 159576 | consumed samples: 564272 | elapsed time per iteration (ms): 17680.4 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9160/ 159576 | consumed samples: 566192 | elapsed time per iteration (ms): 17412.1 | learning rate: 6.000E-05 | global batch size: 192 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9170/ 159576 | consumed samples: 568208 | elapsed time per iteration (ms): 18281.1 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9180/ 159576 | consumed samples: 570288 | elapsed time per iteration (ms): 18627.2 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9190/ 159576 | consumed samples: 572368 | elapsed time per iteration (ms): 18546.6 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 21:10:54] PULSE: tr8-104B is scheduled to start in 17:35:30 (at 2021-09-26T14:46:25) (1188168 on 'gpu_p13' partition) [2021-09-25 21:10:54] PULSE: tr8-104B is running for 16:43:53 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 9200/ 159576 | consumed samples: 574448 | elapsed time per iteration (ms): 18675.7 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9210/ 159576 | consumed samples: 576528 | elapsed time per iteration (ms): 18679.9 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9220/ 159576 | consumed samples: 578608 | elapsed time per iteration (ms): 18524.7 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9230/ 159576 | consumed samples: 580688 | elapsed time per iteration (ms): 18762.7 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9240/ 159576 | consumed samples: 582768 | elapsed time per iteration (ms): 18695.7 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9250/ 159576 | consumed samples: 584848 | elapsed time per iteration (ms): 18780.0 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9260/ 159576 | consumed samples: 586928 | elapsed time per iteration (ms): 18593.2 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9270/ 159576 | consumed samples: 589008 | elapsed time per iteration (ms): 18476.6 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9280/ 159576 | consumed samples: 591088 | elapsed time per iteration (ms): 18595.2 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9290/ 159576 | consumed samples: 593168 | elapsed time per iteration (ms): 18498.1 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9300/ 159576 | consumed samples: 595248 | elapsed time per iteration (ms): 18531.6 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9310/ 159576 | consumed samples: 597328 | elapsed time per iteration (ms): 18538.6 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9320/ 159576 | consumed samples: 599408 | elapsed time per iteration (ms): 18768.3 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9330/ 159576 | consumed samples: 601488 | elapsed time per iteration (ms): 18445.0 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9340/ 159576 | consumed samples: 603568 | elapsed time per iteration (ms): 18700.8 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9350/ 159576 | consumed samples: 605648 | elapsed time per iteration (ms): 18716.7 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9360/ 159576 | consumed samples: 607728 | elapsed time per iteration (ms): 18488.0 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9370/ 159576 | consumed samples: 609808 | elapsed time per iteration (ms): 18621.0 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9380/ 159576 | consumed samples: 611888 | elapsed time per iteration (ms): 18781.4 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9390/ 159576 | consumed samples: 613968 | elapsed time per iteration (ms): 18582.4 | learning rate: 6.000E-05 | global batch size: 208 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 22:11:04] PULSE: tr8-104B is scheduled to start in 17:17:05 (at 2021-09-26T15:28:10) (1188168 on 'gpu_p13' partition) [2021-09-25 22:11:04] PULSE: tr8-104B is running for 17:44:03 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 9400/ 159576 | consumed samples: 616192 | elapsed time per iteration (ms): 19918.8 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9410/ 159576 | consumed samples: 618432 | elapsed time per iteration (ms): 19675.6 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9420/ 159576 | consumed samples: 620672 | elapsed time per iteration (ms): 19904.3 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9430/ 159576 | consumed samples: 622912 | elapsed time per iteration (ms): 19702.9 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9440/ 159576 | consumed samples: 625152 | elapsed time per iteration (ms): 19798.2 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9450/ 159576 | consumed samples: 627392 | elapsed time per iteration (ms): 19797.6 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9460/ 159576 | consumed samples: 629632 | elapsed time per iteration (ms): 20223.0 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9470/ 159576 | consumed samples: 631872 | elapsed time per iteration (ms): 19847.6 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9480/ 159576 | consumed samples: 634112 | elapsed time per iteration (ms): 19783.5 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9490/ 159576 | consumed samples: 636352 | elapsed time per iteration (ms): 19768.8 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9500/ 159576 | consumed samples: 638592 | elapsed time per iteration (ms): 19836.7 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9510/ 159576 | consumed samples: 640832 | elapsed time per iteration (ms): 19791.2 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9520/ 159576 | consumed samples: 643072 | elapsed time per iteration (ms): 19677.8 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9530/ 159576 | consumed samples: 645312 | elapsed time per iteration (ms): 19695.3 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9540/ 159576 | consumed samples: 647552 | elapsed time per iteration (ms): 19697.0 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9550/ 159576 | consumed samples: 649792 | elapsed time per iteration (ms): 19776.4 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9560/ 159576 | consumed samples: 652032 | elapsed time per iteration (ms): 19726.6 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9570/ 159576 | consumed samples: 654272 | elapsed time per iteration (ms): 19764.1 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-25 23:11:05] PULSE: tr8-104B is scheduled to start in 18:13:44 (at 2021-09-26T17:24:50) (1188168 on 'gpu_p13' partition) [2021-09-25 23:11:05] PULSE: tr8-104B is running for 18:44:04 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 9580/ 159576 | consumed samples: 656512 | elapsed time per iteration (ms): 19889.3 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9590/ 159576 | consumed samples: 658752 | elapsed time per iteration (ms): 19672.3 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9600/ 159576 | consumed samples: 660992 | elapsed time per iteration (ms): 19668.0 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9610/ 159576 | consumed samples: 663360 | elapsed time per iteration (ms): 20660.1 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9620/ 159576 | consumed samples: 665760 | elapsed time per iteration (ms): 20759.5 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9630/ 159576 | consumed samples: 668160 | elapsed time per iteration (ms): 20573.3 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9640/ 159576 | consumed samples: 670560 | elapsed time per iteration (ms): 21117.4 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9650/ 159576 | consumed samples: 672960 | elapsed time per iteration (ms): 21312.3 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9660/ 159576 | consumed samples: 675360 | elapsed time per iteration (ms): 20596.0 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9670/ 159576 | consumed samples: 677760 | elapsed time per iteration (ms): 20413.4 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9680/ 159576 | consumed samples: 680160 | elapsed time per iteration (ms): 20820.1 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9690/ 159576 | consumed samples: 682560 | elapsed time per iteration (ms): 20882.2 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9700/ 159576 | consumed samples: 684960 | elapsed time per iteration (ms): 21320.0 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9710/ 159576 | consumed samples: 687360 | elapsed time per iteration (ms): 20632.6 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9720/ 159576 | consumed samples: 689760 | elapsed time per iteration (ms): 20593.0 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9730/ 159576 | consumed samples: 692160 | elapsed time per iteration (ms): 21160.0 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9740/ 159576 | consumed samples: 694560 | elapsed time per iteration (ms): 20918.8 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-26 00:11:13] PULSE: tr8-104B is scheduled to start in 17:13:36 (at 2021-09-26T17:24:50) (1188168 on 'gpu_p13' partition) [2021-09-26 00:11:13] PULSE: tr8-104B is running for 19:44:12 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0) iteration 9750/ 159576 | consumed samples: 696960 | elapsed time per iteration (ms): 20828.1 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9760/ 159576 | consumed samples: 699360 | elapsed time per iteration (ms): 20766.8 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 9768 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-26 00:17:36,090] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step9768/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 9768 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 22024.89 [exiting program after 1190.3113538821538 minutes] datetime: 2021-09-26 00:17:52 [2021-09-26 01:11:06] PULSE: tr8-104B is scheduled to start in 18:25:25 (at 2021-09-26T19:36:32) (1188168 on 'gpu_p13' partition) [2021-09-26 02:11:19] PULSE: tr8-104B is scheduled to start in 17:25:12 (at 2021-09-26T19:36:32) (1188168 on 'gpu_p13' partition) [2021-09-26 03:11:35] PULSE: tr8-104B is scheduled to start in 19:51:55 (at 2021-09-26T23:03:31) (1188168 on 'gpu_p13' partition) [2021-09-26 04:11:39] PULSE: tr8-104B is scheduled to start in 19:06:56 (at 2021-09-26T23:18:36) (1188168 on 'gpu_p13' partition) [2021-09-26 05:11:41] PULSE: tr8-104B is scheduled to start in 18:19:12 (at 2021-09-26T23:30:54) (1188168 on 'gpu_p13' partition) [2021-09-26 06:11:46] PULSE: tr8-104B is scheduled to start in 17:19:07 (at 2021-09-26T23:30:54) (1188168 on 'gpu_p13' partition) [2021-09-26 07:11:59] PULSE: tr8-104B is scheduled to start in 17:27:45 (at 2021-09-27T00:39:45) (1188168 on 'gpu_p13' partition) [2021-09-26 08:12:02] PULSE: tr8-104B is scheduled to start in 12:30:49 (at 2021-09-26T20:42:52) (1188168 on 'gpu_p13' partition) [2021-09-26 09:12:23] PULSE: tr8-104B is scheduled to start in 11:30:28 (at 2021-09-26T20:42:52) (1188168 on 'gpu_p13' partition) [2021-09-26 10:12:24] PULSE: tr8-104B is scheduled to start in 10:30:27 (at 2021-09-26T20:42:52) (1188168 on 'gpu_p13' partition) [2021-09-26 11:12:28] PULSE: tr8-104B is scheduled to start in 9:30:23 (at 2021-09-26T20:42:52) (1188168 on 'gpu_p13' partition) [2021-09-26 12:12:40] PULSE: tr8-104B is scheduled to start in 10:14:45 (at 2021-09-26T22:27:26) (1188168 on 'gpu_p13' partition) [2021-09-26 13:12:49] PULSE: tr8-104B is scheduled to start in 9:14:36 (at 2021-09-26T22:27:26) (1188168 on 'gpu_p13' partition) [2021-09-26 14:12:56] PULSE: tr8-104B is scheduled to start in 8:33:42 (at 2021-09-26T22:46:39) (1188168 on 'gpu_p13' partition) [2021-09-26 15:13:22] PULSE: tr8-104B is scheduled to start in 7:16:41 (at 2021-09-26T22:30:04) (1188168 on 'gpu_p13' partition) [2021-09-26 16:13:24] PULSE: tr8-104B is scheduled to start in 6:16:39 (at 2021-09-26T22:30:04) (1188168 on 'gpu_p13' partition) [2021-09-26 17:13:32] PULSE: tr8-104B is scheduled to start in 5:16:31 (at 2021-09-26T22:30:04) (1188168 on 'gpu_p13' partition) [2021-09-26 18:13:29] PULSE: tr8-104B is scheduled to start in 9:13:25 (at 2021-09-27T03:26:55) (1188168 on 'gpu_p13' partition) [2021-09-26 19:13:42] PULSE: tr8-104B is scheduled to start in 12:06:13 (at 2021-09-27T07:19:56) (1188168 on 'gpu_p13' partition) [2021-09-26 20:13:45] PULSE: tr8-104B is scheduled to start in 11:06:10 (at 2021-09-27T07:19:56) (1188168 on 'gpu_p13' partition) [2021-09-26 21:14:04] PULSE: tr8-104B is scheduled to start in 18:20:04 (at 2021-09-27T15:34:09) (1188168 on 'gpu_p13' partition) [2021-09-26 22:14:04] PULSE: tr8-104B is scheduled to start in 17:20:04 (at 2021-09-27T15:34:09) (1188168 on 'gpu_p13' partition) [2021-09-26 23:14:12] PULSE: tr8-104B is scheduled to start in 16:36:40 (at 2021-09-27T15:50:53) (1188168 on 'gpu_p13' partition) [2021-09-27 00:14:11] PULSE: tr8-104B is scheduled to start in 15:32:33 (at 2021-09-27T15:46:45) (1188168 on 'gpu_p13' partition) [2021-09-27 01:14:15] PULSE: tr8-104B is scheduled to start in 14:32:29 (at 2021-09-27T15:46:45) (1188168 on 'gpu_p13' partition) [2021-09-27 02:14:18] PULSE: tr8-104B is scheduled to start in 10:17:12 (at 2021-09-27T12:31:31) (1188168 on 'gpu_p13' partition) [2021-09-27 03:14:23] PULSE: tr8-104B is scheduled to start in 9:17:07 (at 2021-09-27T12:31:31) (1188168 on 'gpu_p13' partition) ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] ninja .................. [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- transformer ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- ninja .................. [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] op name ................ installed .. compatible stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninja .................. [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] ninja .................. [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- op name ................ installed .. compatible cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] ninja .................. [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- transformer ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] ninjastochastic_transformer ................... [OKAY][NO] .......-------------------------------------------------- [OKAY] op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] op name ................ installed .. compatible stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- JIT compiled ops requires ninja fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] ninja .................. [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] ninja .................. [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- transformer ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] ninja .................. [OKAY] stochastic_transformer . [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- transformer ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- op name ................ installed .. compatible stochastic_transformer . [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninja .................. [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] ninja .................. [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninja .................. [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- transformer ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja stochastic_transformer . [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- ninja .................. [OKAY] op name ................ installed .. compatible -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] op name ................ installed .. compatible -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] ninja .................. [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- transformer ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] op name ................ installed .. compatible JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] stochastic_transformer . [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- transformer ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- op name ................ installed .. compatible NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] op name ................ installed .. compatible -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report fused_adam ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] op name ................ installed .. compatible -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- ninja .................. [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] ninja .................. [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] op name ................ installed .. compatible transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninja .................. [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] ninja .................. [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] JIT compiled ops requires ninja op name ................ installed .. compatible -------------------------------------------------- transformer ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- ninja .................. [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- JIT compiled ops requires ninja cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] ninja .................. [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. op name ................ installed .. compatible -------------------------------------------------- async_io ............... [NO] ....... [NO] cpu_adam ............... [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system fused_adam ............. [NO] ....... [OKAY] meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system sparse_attn ............ [NO] ....... [OKAY] meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- ninja .................. [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] op name ................ installed .. compatible -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] ninja .................. [OKAY] fused_lamb ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- ninjaninja .................................... [OKAY][OKAY] -------------------------------------------------- --------------------------------------------------op name transformer ............ [NO] ....... [OKAY] ninja .................. [OKAY] cpu_adam ............... [YES] ...... [OKAY] ................ op nameinstalled .................. installedcompatible ..-------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] compatible -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] cpu_adam ............... [YES] cpu_adam...... [OKAY]............... cpu_adam ............... [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] fused_adam ............. [NO] .......fused_adam [OKAY] fused_lamb ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ............. [NO]fused_lamb .................... [NO][OKAY] sparse_attn ............ [NO] ....... [OKAY] ....... [OKAY] transformer ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn transformer............ ............[NO] [NO] .............. [OKAY][OKAY] stochastic_transformertransformer ............. [NO][NO] ....... .......[OKAY] [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] op name ................ installed .. compatible -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. sparse_attn ............ [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] ninja .................. [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] ninja .................. [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninja .................. [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- utils .................. [YES] ...... [OKAY] JIT compiled ops requires ninja quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ninja .................. [OKAY] async_io ............... [NO] ....... [NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- utils .................. [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. fused_adam-------------------------------------------------- .............JIT compiled ops requires ninja [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] op name ................ installed .. compatible -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. stochastic_transformer . [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed C++/CUDA extension op report -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] cpu_adam ............... [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. transformer ............ [NO] ....... [OKAY] -------------------------------------------------- JIT compiled ops requires ninja stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] ninja .................. [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. sparse_attn ............ [NO] ....... [OKAY] async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] stochastic_transformer . [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] ninja .................. [OKAY] quantizer .............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] op name ................ installed .. compatible -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference utils.. ..................[NO] [YES]....... ......[OKAY] [OKAY] quantizer utils.............. ..................[NO] [YES]....... ......[OKAY] [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_adam ............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] fused_lamb ............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- ninja .................. [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] op name ................ installed .. compatible fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] fused_adam ............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] JIT compiled ops requires ninja utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- JIT compiled ops requires ninja async_io ............... [NO] ....... [NO] ninja .................. [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- utils .................. [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. stochastic_transformer . [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system fused_adam ............. [NO] ....... [OKAY] meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] async_io ............... [NO] ....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] transformer_inference .. [NO] ....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]......  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] [OKAY] async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- utils .................. [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... utils[NO] ......................... [YES][NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... async_io[NO] ............... async_io[NO] ...................... [NO] ...... [OKAY] [NO] ....... transformer_inference[NO] .. [NO] ....... [OKAY] quantizer .............. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] transformer_inference transformer_inferenceutils.. ....................[NO] [NO][YES]....... .............[OKAY] [OKAY][OKAY] -------------------------------------------------- quantizer utils.............. utils..................[NO] [YES]......................... ...... [OKAY] [YES] [OKAY] ...... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. quantizer .............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] .......[NO] [NO]....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] .......[NO] [NO]....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- quantizer .............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference ..utils [NO].................. .......[YES] ......[OKAY] [OKAY] async_io ............... [NO] ....... [NO] quantizer utils.............. ..................[NO] [YES]....... ......[OKAY] [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninja .................................... [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- op name op name................ installed................ .. installedcompatible ..-------------------------------------------------- compatible -------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... ...............[OKAY] [YES] ...... [OKAY] fused_adam ............. [NO]fused_adam ....... .............[OKAY] [NO] fused_lamb....... .............[OKAY] [NO] ....... fused_lamb[OKAY] ............. [NO] ....... [OKAY] sparse_attn ............ [NO] .......sparse_attn [OKAY] ............ [NO]transformer ................... [NO][OKAY] ....... [OKAY] transformer ............ stochastic_transformer[NO] ........ [OKAY][NO] ninja .................. [OKAY] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] ...... quantizer[OKAY] .............. [NO] .......quantizer [OKAY].............. [NO] .......-------------------------------------------------- [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer ............ [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] stochastic_transformer . [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] ninja .................. [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] op name ................ installed .. compatible -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. JIT compiled ops requires ninja torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch version .................... 1.8.1 transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] torch cuda version ............... 11.1 utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info: nvcc version ..................... 11.2 quantizer .............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ninja .................. [OKAY] torch version .................... 1.8.1 -------------------------------------------------- torch cuda version ............... 11.1 nvcc version ..................... 11.2 op name ................ installed .. compatible -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] cpu_adam ............... [YES] ...... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science fused_adam ............. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] DeepSpeed general environment info: fused_adam ............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 fused_lamb ............. [NO] ....... [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 sparse_attn ............ [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_io ............... [NO] ....... [NO] DeepSpeed general environment info: torch version .................... 1.8.1 transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ............... 11.1 utils .................. [YES] ...... [OKAY] torch version .................... 1.8.1 nvcc version ..................... 11.2 quantizer .............. [NO] ....... [OKAY] torch cuda version ............... 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- nvcc version ..................... 11.2 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] utils .................. [YES] ...... [OKAY] torch version .................... 1.8.1 quantizer .............. [NO] ....... [OKAY] torch cuda version ............... 11.1 -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] sparse_attn ............ [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- transformer ............ [NO] ....... [OKAY] JIT compiled ops requires ninja async_io ............... [NO] ....... [NO] op name ................ installed .. compatible -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer ............ [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] stochastic_transformer . [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ...... [OKAY]quantizer async_io ...............async_io [NO]............... .......[NO] [NO]....... .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- utils .................. [YES] ...... [OKAY] JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. quantizer .............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- DeepSpeed C++/CUDA extension op report torch version .................... 1.8.1 -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja torch cuda version ............... 11.1 nvcc version ..................... 11.2 ninja .................. [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer ............ [NO] ....... [OKAY] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] async_io ............... [NO] ....... [NO] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- utils .................. [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- ninja .................. [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- ninja .................. [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. stochastic_transformer . [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 ninja .................. [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 -------------------------------------------------- op name ................ installed .. compatible deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ninja .................. [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] op name ................ installed .. compatible -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. cpu_adam ............... [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] fused_adam ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] transformer ............ [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[NO] transformer_inference .. [NO] async_io....... [OKAY]............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] transformer_inference quantizer.. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninja .................. [OKAY].................. [OKAY]-------------------------------------------------- --------------------------------------------------op name ................ op nameinstalled .................. compatibleinstalled --------------------------------------------------.. compatible -------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... [OKAY]............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY]fused_adam ............. [NO] fused_lamb....... .............[OKAY] [NO] ....... fused_lamb[OKAY] ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn transformer............ ............ [NO][NO] .............. [OKAY][OKAY] transformerstochastic_transformer ............ .[NO] [NO]....... ....... [OKAY][OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ...............torch version 11.1.................... nvcc version1.8.1 ..................... 11.2torch cuda version deepspeed install path............... ...........11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']nvcc version .....................deepspeed info 11.2................... deepspeed install path0.4.2+bc17042, bc17042, big-science ...........deepspeed wheel compiled w. ......['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch 1.8, cuda 11.1deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] DeepSpeed general environment info: quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- -------------------------------------------------- torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] ninja .................. [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. nvcc version ..................... 11.2 async_ioasync_io .............................. [NO][NO] .............. [NO][NO] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] ninja .................. [OKAY] utils .................. [YES] ...... [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] DeepSpeed general environment info: -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] op name ................ installed .. compatible -------------------------------------------------- torch version .................... 1.8.1 cpu_adam ............... [YES] ...... [OKAY] torch cuda version ............... 11.1 fused_adam ............. [NO] ....... [OKAY] nvcc version ..................... 11.2 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] JIT compiled ops requires ninja sparse_attn ............ [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. cpu_adam ............... [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] fused_adam ............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] DeepSpeed general environment info: stochastic_transformer . [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info: deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] ninja .................. [OKAY] transformer ............ [NO] ....... [OKAY] -------------------------------------------------- stochastic_transformer . [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- DeepSpeed C++/CUDA extension op report deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer ............ [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] stochastic_transformer . [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... async_io[NO] ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... transformer_inference[OKAY] async_io ............... [NO] ....... [NO] .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] ...... quantizer[OKAY] utils .................. [YES] ...... [OKAY] .............. [NO] .......quantizer [OKAY].............. [NO] ....... --------------------------------------------------[OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] DeepSpeed general environment info: quantizer .............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- torch version .................... 1.8.1 torch cuda version ............... 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. nvcc version ..................... 11.2 async_io ............... [NO] ....... [NO] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science transformer_inference .. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] ninja .................. [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found fused_adam ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found fused_lamb ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found sparse_attn ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found transformer ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found DeepSpeed general environment info:DeepSpeed general environment info:DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install pathtorch install pathtorch install path ............................................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch versiontorch version ............................................................ 1.8.11.8.11.8.1 /bin/sh: line 0: type: git: not found torch cuda versiontorch cuda versiontorch cuda version ............................................. 11.111.111.1 nvcc versionnvcc versionnvcc version ............................................................... 11.211.211.2 /bin/sh: line 0: type: git: not found deepspeed install pathdeepspeed install path deepspeed install path ........... ........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w.deepspeed wheel compiled w....... ............torch 1.8, cuda 11.1 torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. utils[NO] ......................... [YES][OKAY] ...... [OKAY] utils quantizer.................. ..............[YES] [NO]...... .......[OKAY] [OKAY] quantizer-------------------------------------------------- .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 torch version .................... 1.8.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch cuda version ............... 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_ioasync_io....... [NO].............................. [NO][NO] .............. [NO][NO] transformer_inference .. [NO] ....... [OKAY] transformer_inferencetransformer_inference .. ..[NO] utils [NO] ....... .................. ....... [OKAY] [YES] [OKAY] ...... [OKAY] utils utils..................quantizer ..................[YES].............. [YES]......[NO] ......[OKAY]....... [OKAY][OKAY] quantizerquantizer --------------------------------------------------............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found utils .................. [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] DeepSpeed general environment info: fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 sparse_attn ............ [NO] ....... [OKAY] torch cuda version ............... 11.1 transformer ............ [NO] ....... [OKAY] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] stochastic_transformer . [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] DeepSpeed general environment info: quantizer .............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. op name ................ installed .. compatible -------------------------------------------------- async_io ............... [NO] ....... [NO] cpu_adam ............... [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed general environment info: -------------------------------------------------- JIT compiled ops requires ninja torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. nvcc version ..................... 11.2 async_io ............... [NO] ....... [NO] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] transformer_inference .. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found utils .................. [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found quantizer .............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- torch cuda version ............... 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 JIT compiled ops requires ninja nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 DeepSpeed general environment info: torch cuda version ............... 11.1 nvcc version ..................... 11.2 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch version .................... 1.8.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch cuda version ............... 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer ..............quantizer [NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- ninja .................. [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- DeepSpeed general environment info: op name ................ installed .. compatible torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- torch version .................... 1.8.1 cpu_adam ............... [YES] ...... [OKAY] torch cuda version ............... 11.1 fused_adam ............. [NO] ....... [OKAY] nvcc version ..................... 11.2 fused_lamb ............. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] sparse_attn ............ [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report /bin/sh: line 0: type: git: not found nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found ninja .................. [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- /bin/sh: line 0: type: git: not found op name ................ installed .. compatible -------------------------------------------------- /bin/sh: line 0: type: git: not found cpu_adam ............... [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found fused_adam ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY]cpu_adam ...............-------------------------------------------------- [YES] ......op name ................[OKAY] installed .. compatible -------------------------------------------------- fused_adam ............. [NO] cpu_adam....... ...............[OKAY] [YES] ...... fused_lamb[OKAY] ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attn fused_lamb............ .............[NO] [NO]....... .......[OKAY] [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer sparse_attn ............. [NO][NO] ....... .......[OKAY] [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninjafused_adam ............................... [OKAY][NO] --------------------------------------------------....... op name[OKAY] ................ installed fused_lamb.. compatible............. --------------------------------------------------[NO] ....... [OKAY] DeepSpeed general environment info: cpu_adam ............... [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 ninja .................. [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science op name ................ installed .. compatible fused_adam .............transformer [NO] ................... [NO][OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- ....... [OKAY]fused_lamb cpu_adam ............... [YES] ...... [OKAY] ............. [NO] ....... stochastic_transformer[OKAY] fused_adam ............. [NO] ....... [OKAY] . [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] DeepSpeed general environment info: transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch cuda version ............... 11.1 nvcc version ..................... 11.2 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found ninja .................. [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- /bin/sh: line 0: type: git: not found op name ................ installed .. compatible -------------------------------------------------- /bin/sh: line 0: type: git: not found cpu_adam ............... [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found fused_adam ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found fused_lamb ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found sparse_attn ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found transformer ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 DeepSpeed general environment info: nvcc version ..................... 11.2 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch cuda version ............... 11.1 DeepSpeed general environment info:torch install path ............... torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch cuda version torch version............... ....................11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science 1.8.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 nvcc version .....................torch cuda version 11.2............... deepspeed install path11.1 ...........nvcc version .....................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 11.2deepspeed info deepspeed install path................... ...........0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ......deepspeed info torch 1.8, cuda 11.1................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... DeepSpeed general environment info:['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... torch install path0.4.2+bc17042, bc17042, big-science ...............deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science DeepSpeed general environment info: deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 async_io ............... [NO] ....... [NO] nvcc version ..................... 11.2 nvcc version ..................... 11.2 transformer_inference .. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] utils .................. [YES] ...... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']DeepSpeed general environment info: DeepSpeed general environment info: torch version .................... 1.8.1 torch install path torch cuda version............... ............... 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version .....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 11.2 deepspeed install pathtorch version ............................... 1.8.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch version .................... 1.8.1 torch cuda version ............... 11.1 deepspeed infotorch cuda version .................................. 0.4.2+bc17042, bc17042, big-science11.1 deepspeed wheel compiled w.nvcc version ........................... torch 1.8, cuda 11.111.2 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info: deepspeed info ................... 0.4.2+bc17042, bc17042, big-science torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ............... 11.1 torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found nvcc version ..................... 11.2 torch cuda version ............... 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science DeepSpeed general environment info: nvcc version ..................... 11.2 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch cuda version ............... 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: DeepSpeed general environment info: nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 async_io ............... [NO] ....... [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference .. [NO] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 torch version .................... 1.8.1 utils .................. [YES] ...... [OKAY] nvcc version ..................... 11.2 torch cuda version ............... 11.1 quantizer .............. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] nvcc version ..................... 11.2 -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 /bin/sh: line 0: type: git: not found nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found async_io ............... [NO] ....... [NO] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 transformer_inference .. [NO] ....... [OKAY] torch cuda version ............... 11.1 utils .................. [YES] ...... [OKAY] nvcc version ..................... 11.2 quantizer .............. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 transformer_inference .. [NO] ....... [OKAY] deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja using world size: 512, data-parallel-size: 16, tensor-model-parallel size: 4, pipeline-model-parallel size: 8 /bin/sh: line 0: type: git: not found using torch.float16 for parameters ... /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found ------------------------ arguments ------------------------ /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 /bin/sh: line 0: type: git: not found nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found accumulate_allreduce_grads_in_fp32 .............. False /bin/sh: line 0: type: git: not found adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science clip_grad ....................................... 1.0 codecarbon_dir .................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/codecarbon consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False data_impl ....................................... mmap data_parallel_size .............................. 16 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1188168.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 5 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 20480 finetune ........................................ False fp16 ............................................ True torch version .................... 1.8.1 fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False global_batch_size ............................... 2048 hidden_dropout .................................. 0.1 hidden_size ..................................... 16384 torch cuda version ............... 11.1 nvcc version ..................... 11.2 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 512 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 10 /bin/sh: line 0: type: git: not found deepspeed info ................... 0.4.2+bc17042, bc17042, big-science log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_scale ...................................... 12.0 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ 126953125 lr_decay_style .................................. cosine lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 216320 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None /bin/sh: line 0: type: git: not found num_attention_heads ............................. 32 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 32 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None /bin/sh: line 0: type: git: not found openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False /bin/sh: line 0: type: git: not found patch_dim ....................................... 16 pipeline_model_parallel_size .................... 8 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['16', '16', '6_000_000'] /bin/sh: line 0: type: git: not found rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 /bin/sh: line 0: type: git: not found sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False /bin/sh: line 0: type: git: not found seed ............................................ 42 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer /bin/sh: line 0: type: git: not found train_iters ..................................... None train_samples ................................... 300000000 use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 /bin/sh: line 0: type: git: not found vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 512 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- /bin/sh: line 0: type: git: not found will use batch size rampup starting from global batch size 16 to global batch size 2048 with batch size increments 16 over 6000000 samples. /bin/sh: line 0: type: git: not found ninja .................. [OKAY] > building GPT2BPETokenizer tokenizer ... -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: /bin/sh: line 0: type: git: not found DeepSpeed general environment info:torch install path ............... torch install path ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda versiontorch version ................................... 11.11.8.1 /bin/sh: line 0: type: git: not found nvcc version .....................torch cuda version 11.2............... deepspeed install path11.1 ...........nvcc version .....................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 11.2deepspeed info deepspeed install path................... ...........0.4.2+bc17042, bc17042, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. ......deepspeed info torch 1.8, cuda 11.1................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. /bin/sh: line 0: type: git: not found async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 /bin/sh: line 0: type: git: not found nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 DeepSpeed general environment info: deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science torch version .................... 1.8.1 torch cuda version ............... 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed C++/CUDA extension op report -------------------------------------------------- torch version .................... 1.8.1 NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja torch cuda version ............... 11.1 nvcc version ..................... 11.2 ninja .................. [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found transformer ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found async_io ............... [NO] ....... [NO] torch version .................... 1.8.1 transformer_inference .. [NO] ....... [OKAY] ninjaninja .................. ..................[OKAY] [OKAY]-------------------------------------------------- torch cuda version ............... 11.1 utils .................. [YES] ...... [OKAY] --------------------------------------------------op name nvcc version ..................... 11.2 quantizer .............. [NO] ....... [OKAY] ................ op nameinstalled .................. compatibleinstalled --------------------------------------------------.. compatible -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- cpu_adam ............... [YES]cpu_adam ...... ...............[OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 [YES] ...... [OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY] ............. fused_lamb[NO] .................... [NO] [OKAY]....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found sparse_attn ............ [NO] ....... [OKAY] sparse_attntransformer ........................ [NO][NO] .............. [OKAY][OKAY] transformer stochastic_transformer............ [NO]. [NO]....... .......[OKAY] [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] ninja .................. [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`................  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[NO] ....... [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_iotransformer_inference ................. async_io[NO][NO] ............................. [NO][OKAY][NO] torch version .................... 1.8.1 torch cuda version ............... 11.1 ....... [NO] /bin/sh: line 0: type: git: not found nvcc version ..................... 11.2 utils .................. [YES] ...... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] transformer_inference transformer_inference.. ..[NO]quantizer .......[NO].............. [OKAY][NO]....... .......[OKAY] [OKAY] utils--------------------------------------------------utils deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] /bin/sh: line 0: type: git: not found ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- DeepSpeed general environment info: cpu_adam ............... [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] fused_adam ............. [NO] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 fused_lamb ............. [NO] ....... [OKAY] nvcc version ..................... 11.2 sparse_attn ............ [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] transformer ............ [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science stochastic_transformer . [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found fused_lamb ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found sparse_attn ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found transformer ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ninja .................. [OKAY] DeepSpeed general environment info: torch cuda version ............... 11.1 async_io ............... [NO] ....... [NO] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 transformer_inference .. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- DeepSpeed general environment info: torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1torch version ....................torch cuda version 1.8.1............... deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] utils .................. [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 -------------------------------------------------- 11.1 torch cuda versionnvcc version .................................... 11.111.2 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science quantizer .............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science DeepSpeed C++/CUDA extension op report -------------------------------------------------- nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.-------------------------------------------------- fused_lamb ............. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science async_io ............... [NO] ....... [NO] sparse_attn ............ [NO] ....... [OKAY] DeepSpeed general environment info: deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference .. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] utils .................. [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] torch version .................... 1.8.1 quantizer .............. [NO] ....... [OKAY] torch cuda version ............... 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] DeepSpeed general environment info: -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 DeepSpeed general environment info: torch cuda version ............... 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 torch version .................... 1.8.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch cuda version ............... 11.1 /bin/sh: line 0: type: git: not found nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ninja .................. [OKAY] -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science op name ................ installed .. compatible -------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io ............... async_io[NO] ...................... [NO][NO] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ....... [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 utils .................. utils[YES] ........................ [YES][OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 ...... [OKAY] quantizer .............. [NO] quantizer....... ..............[OKAY] nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. **** Git info for Megatron: git_hash=unknown git_branch=unknown **** async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer_inference .. [NO] utils....... ..................[OKAY] DeepSpeed general environment info: [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 utils ..................quantizer [YES].............. ......[NO] [OKAY]....... DeepSpeed general environment info:torch cuda version ............... 11.1 [OKAY] quantizer .............. --------------------------------------------------[NO] nvcc versiontorch install path .................................... 11.2 ....... [OKAY] -------------------------------------------------- deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info torch version................... ....................0.4.2+bc17042, bc17042, big-science 1.8.1deepspeed wheel compiled w. ...... torch cuda versiontorch 1.8, cuda 11.1 ............... 11.1 nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ..................-------------------------------------------------- [OKAY]JIT compiled ops requires ninja -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] DeepSpeed general environment info: stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info:torch install path ............... torch install path ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1 torch cuda versiontorch version ................................... 11.11.8.1 nvcc version .....................torch cuda version 11.2............... 11.1deepspeed install path ...........nvcc version ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.2 DeepSpeed general environment info: torch install pathDeepSpeed general environment info: ............... deepspeed infodeepspeed install path .............................. 0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.1 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ...............torch version ....................11.1 1.8.1nvcc version .....................torch cuda version 11.2............... deepspeed install path11.1 ...........nvcc version .....................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 11.2 deepspeed info deepspeed install path................... ...........0.4.2+bc17042, bc17042, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. ......deepspeed info torch 1.8, cuda 11.1................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info: torch version .................... 1.8.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ninja .................. [OKAY] -------------------------------------------------- torch cuda version ............... 11.1 nvcc version ..................... 11.2 torch version .................... 1.8.1 op name ................ installed .. compatible -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch cuda version ............... 11.1 nvcc version ..................... 11.2 cpu_adam ............... [YES] ...... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] fused_adam ............. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 torch cuda version ............... 11.1 /bin/sh: line 0: type: git: not found nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. nvcc version ..................... 11.2 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] async_io ............... [NO] ....... [NO] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science transformer_inference .. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. JIT compiled ops requires ninja async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- /bin/sh: line 0: type: git: not found DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. **** Git info for Megatron: git_hash=unknown git_branch=unknown **** async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- /bin/sh: line 0: type: git: not found cpu_adam ............... [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found fused_adam ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found fused_lamb ............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found sparse_attn ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found transformer ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference .. transformer_inference[NO] ....... [OKAY] .. [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES]quantizer .................... [OKAY][NO] DeepSpeed general environment info: ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] quantizer .............. [NO]-------------------------------------------------- ....... [OKAY] torch version .................... 1.8.1 -------------------------------------------------- torch cuda version ............... 11.1 nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: ninja .................. [OKAY] -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 op name ................ installed .. compatible -------------------------------------------------- torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] cpu_adam ............... [YES] ...... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 fused_adam ............. [NO] ....... [OKAY] DeepSpeed general environment info: /bin/sh: line 0: type: git: not found fused_lamb ............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** sparse_attn ............ [NO] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 transformer ............ [NO] ....... [OKAY] nvcc version ..................... 11.2 stochastic_transformer . [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- /bin/sh: line 0: type: git: not found DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system /bin/sh: line 0: type: git: not found meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 ninja .................. [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] op name ................ installed .. compatible -------------------------------------------------- transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] fused_adam ............. [NO] ....... [OKAY] ...... [OKAY] DeepSpeed general environment info: fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- torch install pathDeepSpeed general environment info: ............... torch install path['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] sparse_attn ............ [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- ............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version torch version............... ....................11.1 1.8.1nvcc version /bin/sh: line 0: type: git: not found transformer ............ [NO] ....... [OKAY] .....................torch cuda version 11.2............... deepspeed install path11.1 stochastic_transformer . [NO] ....... [OKAY] ...........nvcc version .....................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 11.2deepspeed info deepspeed install path................... ...........0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ......deepspeed info torch 1.8, cuda 11.1................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** quantizer .............. [NO] ....... [OKAY] DeepSpeed general environment info: -------------------------------------------------- ninja .................. [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- torch version .................... 1.8.1 op name ................ installed .. compatible -------------------------------------------------- torch cuda version ............... 11.1 cpu_adam ............... [YES] ...... [OKAY] nvcc version ..................... 11.2 fused_adam ............. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] fused_lamb ............. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 sparse_attn ............ [NO] ....... [OKAY] DeepSpeed general environment info: transformer ............ [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] stochastic_transformer . [NO] ....... [OKAY] torch version .................... 1.8.1 DeepSpeed general environment info: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch cuda version ............... 11.1 DeepSpeed general environment info: nvcc version ..................... 11.2 torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1torch version deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] .................... torch cuda version1.8.1 ............... 11.1torch cuda version nvcc version............... .....................11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science 11.2nvcc version deepspeed install path..................... ...........11.2 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path ...........deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-sciencedeepspeed info ...................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... torch 1.8, cuda 11.1deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... transformer_inference[OKAY] .. [NO] ....... [OKAY]utils .................. [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utils .................. [YES]quantizer .................... [OKAY][NO] ....... [OKAY] quantizer .............. [NO]-------------------------------------------------- ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO]............... /bin/sh: line 0: type: git: not found [NO] ....... [NO] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] utils....... ..................[OKAY] [YES] ...... [OKAY] utils ..................quantizer [YES].............. ......[NO] [OKAY]....... [OKAY] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`........ [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utils .................. [YES] ...... [OKAY] quantizer async_io.............. ...............[NO] [NO]....... .......[OKAY] [NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] DeepSpeed general environment info: quantizer .............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- torch version .................... 1.8.1 torch cuda version ............... 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** async_io ............... [NO] ....... [NO] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science transformer_inference .. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] DeepSpeed general environment info: async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 utils .................. [YES] ...... [OKAY] torch cuda version ............... 11.1 quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. nvcc version ..................... 11.2 -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] async_io ............... [NO] ....... [NO] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science transformer_inference .. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info: deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................ ................installed installed.. compatible.. compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam [YES] ..................... [OKAY][YES] ...... [OKAY] fused_adam ............. fused_adam[NO] .................... [OKAY] [NO] ....... fused_lamb[OKAY] ............. [NO] .......fused_lamb [OKAY]............. [NO] ....... [OKAY] sparse_attn ............ sparse_attn[NO] ................... [OKAY][NO] ....... transformer[OKAY] ............ [NO] ....... [OKAY]transformer ............ [NO] stochastic_transformer....... [OKAY]. [NO] ....... stochastic_transformer[OKAY] . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report nvcc version ..................... 11.2 -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ninja .................. [OKAY] torch version .................... 1.8.1 -------------------------------------------------- torch cuda version ............... 11.1 op name ................ installed .. compatible -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] cpu_adam ............... [YES] ...... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science fused_adam ............. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. stochastic_transformer . [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] ninja .................. [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] op name ................ installed .. compatible -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_adam ............. [NO] ....... [OKAY] async_ioasync_io .............................. [NO] [NO]....... .......[NO] [NO] fused_lamb ............. [NO] ....... [OKAY] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] DeepSpeed C++/CUDA extension op report -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2DeepSpeed general environment info: deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infotorch install path ................... ...............0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] ninjaninjaninja .................................... ..................[OKAY][OKAY] [OKAY] utils .................. [YES] ...... [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ quantizer .............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found op nameop nameop name ................................................ installedinstalledinstalled ...... compatiblecompatiblecompatible /bin/sh: line 0: type: git: not found -------------------------------------------------- /bin/sh: line 0: type: git: not found ------------------------------------------------------------------------------------------------------------------------------------------------------ /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found cpu_adam fused_adamfused_adam .......................... [NO]...............[NO] .......[YES]....... [OKAY]......[OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found [OKAY]fused_lamb fused_lamb............. .............[NO] [NO]....... .......[OKAY] [OKAY]fused_adam ............. [NO] ....... [OKAY] sparse_attn sparse_attn............fused_lamb .........................[NO] [NO][NO]....... .......[OKAY] [OKAY] .......transformertransformer [OKAY]............ ............ [NO][NO] .............. [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] DeepSpeed general environment info: sparse_attn ............ [NO] ....... [OKAY] DeepSpeed general environment info: transformer ............ [NO] ....... [OKAY] torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] stochastic_transformer . [NO] ....... [OKAY] 1.8.1 torch version torch cuda version.................... ...............1.8.1 11.1 torch cuda versionnvcc version .................................... 11.111.2 DeepSpeed general environment info: nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info: deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] DeepSpeed general environment info: transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] torch version .................... 1.8.1 quantizer .............. quantizer[NO] ..................... [NO][OKAY] torch cuda version ............... 11.1 ....... [OKAY] -------------------------------------------------- nvcc version ..................... 11.2 -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... torch version1.8.1 .................... torch cuda version1.8.1 ............... torch cuda version11.1 ...............nvcc version 11.1..................... nvcc version11.2 .....................deepspeed install path 11.2........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']................... deepspeed info0.4.2+bc17042, bc17042, big-science ...................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... [NO]....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] ....... [OKAY]utils .................. [YES] ...... utils[OKAY] .................. [YES] ...... quantizer[OKAY] .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 DeepSpeed general environment info: torch cuda version ............... 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 torch version .................... 1.8.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science torch cuda version ............... 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 DeepSpeed general environment info: torch cuda version ............... 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 torch version .................... 1.8.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch cuda version ............... 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info: torch version .................... 1.8.1 torch cuda version ............... 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 torch version .................... 1.8.1 torch cuda version ............... 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science nvcc version ..................... 11.2 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO]transformer_inference ......... [NO][NO] ....... [OKAY] utils .................. [YES]transformer_inference ........ [OKAY][NO] ....... [OKAY]quantizer .............. [NO] ....... [OKAY]utils .................. [YES] --------------------------------------------------...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ...............torch version 11.1.................... nvcc version1.8.1 ..................... 11.2torch cuda version deepspeed install path............... ...........11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']nvcc version deepspeed info..................... ...................11.2 0.4.2+bc17042, bc17042, big-sciencedeepspeed install path deepspeed wheel compiled w............ ...... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] DeepSpeed general environment info: quantizer .............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infoDeepSpeed general environment info: ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ................... ................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > setting codecarbon ... **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam [YES] ..................... [YES][OKAY] ...... [OKAY] fused_adam ............. fused_adam[NO] .................... [OKAY][NO] ....... [OKAY] fused_lamb ............. fused_lamb[NO] .................... [NO][OKAY] ....... [OKAY] sparse_attn ............ sparse_attn[NO] ................... [NO][OKAY] ....... [OKAY] transformer transformer............ ............[NO] [NO]....... .......[OKAY] [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- > initializing torch distributed ... DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda versiontorch version ................................... 11.11.8.1 nvcc version torch cuda version..................... ...............11.2 11.1deepspeed install path nvcc version........... ..................... 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install pathdeepspeed info .............................. 0.4.2+bc17042, bc17042, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. deepspeed info...... ...................torch 1.8, cuda 11.1 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op nameop nameop name................ ................................installed ................ installed installed ..installed .... compatible compatible.. compatible ---------------------------------------------------------------------------------------------------- compatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam cpu_adam...............cpu_adam .............................. [YES] ............... [YES][YES] ...... [YES]......[OKAY]...... [OKAY] ...... [OKAY] [OKAY] fused_adam ............. [NO] .......fused_adamfused_adam fused_adam [OKAY] .......................... ............. [NO] fused_lamb [NO][NO] ....... .................... [OKAY]....... [NO][OKAY] [OKAY] fused_lamb.......fused_lamb [OKAY] fused_lamb ............. ............. ............. [NO] [NO] [NO]....... ..............[OKAY] [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ sparse_attn[NO]sparse_attnsparse_attn ........................................... [OKAY] [NO] .......[NO] stochastic_transformer[NO] [OKAY].............. . [OKAY] [OKAY] transformer[NO] transformer............transformer....... ............[OKAY]............[NO] [NO].......[NO] .......[OKAY]....... [OKAY][OKAY] stochastic_transformer stochastic_transformer. stochastic_transformer[NO] . . ....... [NO] [NO] [OKAY].............. [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path DeepSpeed general environment info:...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch install path 1.8.1............... torch version torch cuda version.................... ...............1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']11.1 torch cuda version nvcc versiontorch version ............... ......................................... 1.8.111.111.2 nvcc versiondeepspeed install pathtorch cuda version ............................................... 11.2 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install pathnvcc version deepspeed info........... ........................................ ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.20.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed install path deepspeed wheel compiled w. ................... ........... ...... 0.4.2+bc17042, bc17042, big-science torch 1.8, cuda 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed wheel compiled w. deepspeed info ......................... torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op nameop nameop name ................ ................................ ................ installed installedinstalledinstalled .... .. .. compatiblecompatiblecompatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- cpu_adam cpu_adamcpu_adam............... cpu_adam...............[YES]............... ............... [YES][YES] ...... [YES]...... ...... [OKAY] ......[OKAY] [OKAY] [OKAY] fused_adamfused_adam .............fused_adam............. fused_adam [NO][NO]............. ............. ....... .......[NO][NO][OKAY] .......[OKAY]....... fused_lamb[OKAY][OKAY] fused_lamb ............. .............[NO] fused_lamb fused_lamb[NO] ....... ............. .................... [OKAY] [NO] [NO] [OKAY] ....... ....... [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY]sparse_attn ............ transformer[NO]sparse_attn sparse_attn............................... [OKAY]............[NO] [NO] [NO] ....... transformer....... ....... [OKAY] ............[OKAY][OKAY] [NO]transformer .......stochastic_transformer............ transformer[OKAY] .[NO]............ [NO].......[NO]stochastic_transformer [OKAY].............. .[OKAY] [OKAY][NO]stochastic_transformer ....... .[OKAY]stochastic_transformer [NO] ........ [NO][OKAY] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaJIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninja ......................................................ninja [OKAY][OKAY][OKAY].................. [OKAY]------------------------------------------------------------------------------------------------------------------------------------------------------ op name--------------------------------------------------op nameop name ................................op name................ installed................installedinstalled ..installed.... compatible..compatiblecompatible compatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam...............cpu_adamcpu_adam ...............[YES].............................. [YES]......[YES][YES] [OKAY].................. [OKAY][OKAY][OKAY] fused_adam ............. fused_adamfused_adam[NO] fused_adam............. ............. ....... ............. [NO][NO] [OKAY] .......[NO]....... [OKAY].......[OKAY]fused_lamb [OKAY]............. fused_lamb[NO]fused_lamb .............fused_lamb.................... [NO].............[OKAY][NO] ....... ....... [NO] [OKAY] [OKAY] ....... [OKAY] sparse_attn ............ [NO]sparse_attn .......sparse_attn............ [OKAY]............sparse_attn[NO] [NO]................... .......transformer[NO][OKAY] [OKAY] ................... [NO][OKAY] transformer.......transformer ............[OKAY]............ [NO][NO]transformer .......................... stochastic_transformer [OKAY] [OKAY][NO] ........ [NO][OKAY] stochastic_transformer stochastic_transformer....... [OKAY] ..stochastic_transformer [NO][NO] ............... [OKAY][NO] [OKAY]....... [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja .................................... ....................................[OKAY][OKAY] [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name op nameop name................................ ................installedinstalled................ ..installed.. installed compatiblecompatible .. .. -------------------------------------------------- --------------------------------------------------compatible compatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam ...............cpu_adam............... cpu_adam [YES] ............... [YES]..................... ......[YES][OKAY][YES] [OKAY]............ [OKAY][OKAY] fused_adam .............fused_adam [NO]............. fused_adam.......[NO]fused_adam [OKAY]................................. [OKAY][NO]fused_lamb [NO] ....... ............. fused_lamb .......[OKAY][NO] .............[OKAY]....... [NO][OKAY]fused_lamb .......fused_lamb............. [OKAY].............[NO] [NO]....... .......[OKAY] [OKAY] sparse_attn ............ [NO] .......sparse_attn [OKAY]............ [NO] .......transformersparse_attn sparse_attn ............[OKAY] ........................[NO]transformer ............[NO].......[NO] [NO] [OKAY] ..................... [OKAY][OKAY][OKAY] stochastic_transformer stochastic_transformer.transformertransformer [NO]........................ . ....... [NO][NO][NO] [OKAY]..................... [OKAY][OKAY] [OKAY] stochastic_transformer stochastic_transformer . .[NO] [NO]....... .......[OKAY] [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja .................................... .................. .................. [OKAY][OKAY] [OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name op name op name op name................ ................................installed................ installedinstalled.. installed .... compatible .. compatiblecompatible-------------------------------------------------- compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adamcpu_adam[YES]cpu_adam ............... .................................... [YES][OKAY][YES][YES] ...... ...... ...... [OKAY][OKAY] [OKAY] fused_adam ............. [NO] ....... [OKAY]fused_adam fused_adamfused_adam fused_lamb....................................... .............[NO][NO] [NO] [NO]..................... [OKAY].......[OKAY] [OKAY] [OKAY]fused_lamb fused_lamb fused_lamb ............. ............. ............. [NO] [NO] [NO] ....... ....... ....... [OKAY]sparse_attn[OKAY] [OKAY]............ [NO] ....... [OKAY] transformer ............ [NO]sparse_attn sparse_attn sparse_attn....... ............ ............ ............[OKAY][NO][NO] [NO].............. stochastic_transformer.......[OKAY][OKAY] [OKAY]transformer. transformer ............[NO]............transformer [NO] .......[NO] ............ [OKAY].............. [NO][OKAY][OKAY] ....... [OKAY] stochastic_transformerstochastic_transformer stochastic_transformer. . [NO][NO]. .............. [OKAY][NO] [OKAY]....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaJIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY] [OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name --------------------------------------------------op name op name ................ op name ................................installed installed..................installed ..compatible ..compatibleinstalled compatible----------------------------------------------------------------------------------------------------.. compatible-------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam .............................. cpu_adam [YES]cpu_adam[YES] ...... ............... ..................... [OKAY][OKAY][YES][YES] ............ [OKAY][OKAY] fused_adam fused_adam............. [NO]............. .......[NO] fused_adam [OKAY]fused_adam ....... ............. fused_lamb[OKAY]............. [NO] .............[NO]....... fused_lamb .......[NO] [OKAY] .................... [OKAY] [OKAY][NO]fused_lamb fused_lamb.................... [OKAY].............[NO] [NO]....... sparse_attn.......[OKAY] ............[OKAY] [NO] sparse_attn....... ............[OKAY] [NO] ....... [OKAY]transformer sparse_attn ............ sparse_attn ............ transformer[NO] [NO]............ ............ ....... [NO] .......[OKAY] [NO]....... [OKAY] ....... stochastic_transformer [OKAY] [OKAY] transformer . ............[NO]transformer stochastic_transformer.......[NO]............ [OKAY] ........ [NO] [NO][OKAY] .............. [OKAY][OKAY] stochastic_transformer . [NO] stochastic_transformer....... [OKAY]. [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op name op nameop name................ installed................................................ installed.. installedinstalled.. compatiblecompatible .... -------------------------------------------------- -------------------------------------------------- compatible compatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam ...............cpu_adam............... cpu_adam [YES] [YES] ............... ..................... ...... [YES][OKAY][OKAY][YES] ...... ...... [OKAY][OKAY] fused_adam fused_adam............. .............[NO]fused_adam fused_adam [NO] .................... ............. ....... [OKAY][NO] [NO] .......[OKAY].......fused_lamb .............[OKAY][OKAY] [NO]fused_lamb .................... fused_lamb [OKAY]fused_lamb [NO] ............. ............. ....... [NO] [NO] [OKAY] ....... ....... [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ sparse_attn[NO]sparse_attn sparse_attn ............ ............................... [NO][NO][NO][OKAY] ....... ....... .......[OKAY] [OKAY] [OKAY] stochastic_transformer transformertransformertransformer . ............ ............[NO]............ .......[NO][NO][NO] [OKAY]..................... [OKAY][OKAY][OKAY] stochastic_transformerstochastic_transformer stochastic_transformer . .[NO] . [NO] ....... [NO].......[OKAY] .......[OKAY] [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... async_io[NO] ............... [NO] ....... [NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] .......utils [OKAY].................. [YES] ...... [OKAY] utils quantizer.................. ..............[YES] [NO]...... .......[OKAY] [OKAY] quantizer-------------------------------------------------- .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninja ..................ninja .................. .................. [OKAY] [OKAY][OKAY].................. ----------------------------------------------------------------------------------------------------[OKAY] -------------------------------------------------- --------------------------------------------------op nameop name op name op name................................ ................ installed ................ ..installed installedinstalled ..compatible.. .. compatible compatible-------------------------------------------------- compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam...............cpu_adam............... [YES]..............................[YES] [YES]......[YES]...... [OKAY] ...... ......[OKAY] [OKAY] [OKAY] fused_adam ............. [NO] fused_adam.......fused_adam fused_adam .............[OKAY]............. .............[NO] [NO] [NO] fused_lamb ....... .............. ............. [OKAY][NO][OKAY] [OKAY] ....... [OKAY]fused_lambfused_lamb fused_lamb ....................................... [NO][NO][NO] ..............sparse_attn ....... [OKAY][OKAY]............ [OKAY] [NO] ninjaninjaninja ninja ...................................................... .................. [OKAY] [OKAY][OKAY][OKAY] ....... [OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- transformer sparse_attn............sparse_attn sparse_attn ............ [NO]........................ [NO]....... [NO] .......[NO].......[OKAY] op nameop name op name................op name................ ................installed................installed installed..installed.. ..compatiblecompatible.. [OKAY] .......[OKAY]stochastic_transformer [OKAY] transformer --------------------------------------------------compatible --------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- .transformer............ [NO]transformer............ ............ ....... [NO][NO][NO] .......[OKAY] .............. cpu_adam ...............cpu_adam [YES]............... cpu_adam......cpu_adam [YES] ............... ...............[OKAY] ...... [YES][OKAY][YES] [OKAY] [OKAY][OKAY] ............ [OKAY][OKAY] stochastic_transformer stochastic_transformerstochastic_transformer. .[NO]. [NO].......[NO] ..............[OKAY] [OKAY][OKAY] fused_adam ............. fused_adam[NO] .................... [NO]fused_adam[OKAY] fused_adam.................... fused_lamb[OKAY] ............. [NO] .............fused_lamb [NO] ....... [NO].................... .......[NO][OKAY] [OKAY] [OKAY] ....... [OKAY]fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] sparse_attn ............sparse_attn [NO]............ .......[NO] [OKAY]....... [OKAY] transformersparse_attn ............transformersparse_attn ............[NO]............ [NO]...................[NO] [OKAY].......[NO] ....... ....... [OKAY] [OKAY] stochastic_transformer[OKAY] transformerstochastic_transformer . ............transformer . [NO][NO]............[NO] ..............[NO] ....... [OKAY] [OKAY] [OKAY]....... stochastic_transformer[OKAY] . [NO]stochastic_transformer ....... [OKAY]. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name op nameop name................ op name................................ installedinstalled................installed .. installed....compatible compatible..--------------------------------------------------compatible --------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES]cpu_adam cpu_adam......cpu_adam ...............[OKAY] ............... [YES]...............[YES] ......[YES]...... [OKAY][OKAY]...... fused_adam [OKAY]............. [NO] ....... [OKAY] fused_adamfused_adamfused_lamb .............fused_adam............. ............. [NO][NO] ............. [NO].............. [OKAY][NO].......[OKAY] .......[OKAY] [OKAY]fused_lamb .............fused_lamb fused_lamb[NO]............. sparse_attn[NO].................... [OKAY] ................... [NO] [NO].......[OKAY] .......[OKAY] [OKAY] transformersparse_attn ........................ [NO][NO] sparse_attn ....... ....... ............ [OKAY]sparse_attn [OKAY] [NO] ............ stochastic_transformer .......[NO]transformer. ...................[OKAY][NO] [OKAY] [NO] .............. transformer[OKAY] [OKAY]transformer ............ ............[NO] [NO]....... stochastic_transformer.......[OKAY] [OKAY]. [NO]stochastic_transformer .......stochastic_transformer .[OKAY] .[NO] [NO]....... .......[OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO]............... .......[NO] [NO]....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY] [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................................................................ installedinstalledinstalledinstalled ........ compatible compatiblecompatible compatible ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam...............cpu_adamcpu_adam ..............................[YES]............... [YES]......[YES][YES] ......[OKAY]............ [OKAY][OKAY] [OKAY] fused_adam .............fused_adamfused_adam [NO].............fused_adam............. ....... [NO]............. [NO] [OKAY] .......[NO] ....... [OKAY].......[OKAY] fused_lamb[OKAY] .............fused_lamb fused_lamb [NO] .......................... fused_lamb....... [NO] [NO] .............[OKAY]....... [NO].......[OKAY] .......[OKAY] [OKAY] sparse_attn ............ [NO]sparse_attn sparse_attn ....... ............ ............sparse_attn[OKAY] [NO][NO]............ ..............transformer[NO] [OKAY] [OKAY]............ ....... [NO][OKAY] transformer ....... transformer ............ [OKAY]transformer ............ [NO]............ .......[NO][NO] [OKAY]..............stochastic_transformer [OKAY][OKAY] .stochastic_transformer [NO] stochastic_transformer....... . stochastic_transformer[OKAY].[NO] [NO]........ .......[OKAY][NO] [OKAY]....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- ninjaninjaninjaninja .................. .................................... ..................[OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name -------------------------------------------------- ................ op nameop name op name ................installed ................ ..................installedinstalled compatible..installed.. --------------------------------------------------compatible.. compatible -------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam [YES]...............cpu_adam cpu_adam ......[YES] ............... ...............[OKAY] ...... [YES] [YES][OKAY] ............ [OKAY][OKAY] fused_adam ............. [NO] fused_adam....... .............[OKAY] fused_adam[NO]fused_adam fused_lamb................................. [OKAY] .............[NO][NO] [NO] fused_lamb....... ....... ....................[OKAY] [NO][OKAY][OKAY] fused_lamb....... [OKAY]fused_lamb............. .............[NO] [NO]....... .......[OKAY] [OKAY] sparse_attn ............ [NO] sparse_attn....... ............[OKAY] [NO] .......transformer sparse_attn[OKAY] sparse_attn ............ ............ transformer............ [NO] [NO] ............[NO] ....... .......[NO][OKAY]....... .......[OKAY][OKAY] stochastic_transformer[OKAY] transformer. transformerstochastic_transformer............ [NO] ............ [NO]........ [NO] [NO]....... [OKAY] ..............[OKAY] [OKAY][OKAY] stochastic_transformer stochastic_transformer. [NO] ........ [NO][OKAY] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] utils....... [OKAY] .................. [YES] ......utils [OKAY] .................. [YES] ...... [OKAY] quantizer ..............quantizer [NO].............. [NO] ....... .......[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] ninjaninjaninjaninja .................. .................. ..................[OKAY].................. [OKAY] quantizer .............. [NO] ....... [OKAY] [OKAY][OKAY] -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name -------------------------------------------------- -------------------------------------------------- op name ................op name op name installed................ ................installed ................ installed.... installed ..compatiblecompatible.. compatible--------------------------------------------------compatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam ...............cpu_adamcpu_adam............... [YES]..............................[YES] [YES]...... ......[YES] ...... [OKAY] ......[OKAY] [OKAY] [OKAY] fused_adam .............fused_adam fused_adamfused_adam [NO] ....................................... ....... [NO][NO] [NO][OKAY] .............. .......[OKAY] fused_lamb [OKAY][OKAY]............. [NO]fused_lamb .......fused_lambfused_lamb............. [NO][OKAY] ............. .................... [NO][NO][OKAY] .............. [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attntransformer ........................ sparse_attn [NO]sparse_attn [NO] ....... ............................... [OKAY][NO][OKAY][NO] ..............transformer stochastic_transformer ............[OKAY] [OKAY] . [NO] [NO]transformer.......transformer ...............................[OKAY] [NO][OKAY][NO] stochastic_transformer ....... ....... [OKAY]. [OKAY] [NO] stochastic_transformer....... stochastic_transformer[OKAY]. .[NO] [NO]....... .......[OKAY] [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... transformer_inference[OKAY] .. [NO] ....... utils[OKAY] .................. [YES] ...... utils[OKAY] .................. [YES] quantizer...... ..............[OKAY] [NO] ....... quantizer[OKAY] .............. [NO] .......-------------------------------------------------- [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name op name ................ ................op name................ installedinstalled................installed ....installed.. compatible compatible.. compatible ---------------------------------------------------------------------------------------------------- compatible -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. cpu_adamcpu_adam ..............................cpu_adam cpu_adam[YES][YES] .................................... ...... [YES][YES][OKAY] ......[OKAY]...... [OKAY][OKAY] async_io ............... [NO] ....... [NO] fused_adam fused_adam............. fused_adam.............[NO] .............fused_adam[NO]....... [NO] .................... [OKAY]....... [OKAY][NO][OKAY] transformer_inference .. [NO] ....... [OKAY] .......fused_lamb [OKAY]fused_lambfused_lamb............. utils .................. [YES] ...... [OKAY] ..........................[NO] fused_lamb[NO] [NO] .................... ....... .......[NO][OKAY] [OKAY] ....... quantizer .............. [NO] ....... [OKAY] [OKAY] [OKAY] -------------------------------------------------- sparse_attnsparse_attnsparse_attn sparse_attn............ ........................ [NO] ............ [NO] .......[NO] [NO] .......[OKAY] ....... ....... [OKAY] [OKAY]transformer[OKAY] ............transformer transformertransformer [NO] ........................................... [NO] [NO][NO][OKAY] ..................... [OKAY][OKAY][OKAY] stochastic_transformer .stochastic_transformer stochastic_transformerstochastic_transformer [NO] .......... [OKAY][NO][NO][NO] ....... ....... ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [OKAY][OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY] [OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name op name................................op name ................installedinstalled ................installed .... installedcompatiblecompatible.. -------------------------------------------------- ..--------------------------------------------------compatible compatible ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam [YES]...............cpu_adam cpu_adam ......[YES] ............... [OKAY] .....................[YES] [YES][OKAY]...... ......[OKAY] [OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY] fused_adam ............. fused_adam.............fused_lamb[NO] [NO]................................. .......[NO][NO][OKAY] [OKAY].............. [OKAY]fused_lamb[OKAY] fused_lamb ............. fused_lamb[NO]............. ....................[NO] [NO][OKAY]....... [OKAY].......sparse_attn ............[OKAY] [NO] ....... [OKAY] sparse_attn transformer............ ............sparse_attn[NO] ............[NO]....... sparse_attn .......[NO] [OKAY] ............ [OKAY]....... transformer[NO][OKAY] stochastic_transformer................... transformer [NO].[OKAY] ....... [NO]............[OKAY] .......transformer[NO] [OKAY]............ .......stochastic_transformer [NO][OKAY] ........ [NO][OKAY]stochastic_transformer ....... [OKAY] .stochastic_transformer [NO] ........ [OKAY][NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op nameop nameop name ................ ................ ................................ installedinstalled installed installed...... compatible..compatiblecompatible compatible------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- cpu_adamcpu_adam cpu_adamcpu_adam ............... .............................. ............... [YES][YES][YES][YES] ........................ [OKAY][OKAY] [OKAY] [OKAY] fused_adamfused_adam fused_adamfused_adam ............. .......................... ............. [NO] [NO] [NO][NO] ....... ....... ....... [OKAY] [OKAY].......[OKAY] [OKAY] fused_lambfused_lambfused_lamb fused_lamb .......................... ............. ............. [NO][NO] [NO] [NO] ....... .............. ....... [OKAY] [OKAY][OKAY][OKAY] sparse_attnsparse_attnsparse_attnsparse_attn .................................... ............ [NO][NO][NO][NO] ....... ..................... [OKAY] [OKAY][OKAY] [OKAY] transformer transformertransformertransformer............ ....................................[NO] [NO][NO][NO]....... ..............[OKAY] .......[OKAY][OKAY] [OKAY] stochastic_transformerstochastic_transformer stochastic_transformerstochastic_transformer . .[NO] ..[NO]....... [NO].......[NO] [OKAY] [OKAY] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................. .................. [OKAY]..................[OKAY] [OKAY][OKAY]-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op nameop name op name................op name................ ................................ installedinstalledinstalled installed .. .... .. compatiblecompatible compatible compatible ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- cpu_adamcpu_adam ...............cpu_adam...............cpu_adam ...............[YES]...............[YES] ......[YES] [YES]...... [OKAY] ...... ...... [OKAY][OKAY] [OKAY] fused_adam ............. [NO]fused_adam fused_adam....... ............. ............. fused_adam[OKAY][NO] .......[NO]............. [OKAY]fused_lamb.......[NO] ............. [OKAY]fused_lamb ....... [NO] .............[OKAY]....... [NO]fused_lamb fused_lamb[OKAY] ....... .......................... [NO][OKAY][NO] .............. [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............ transformer[NO] ................... [NO]sparse_attnsparse_attn [OKAY] ............................... transformer[OKAY][NO] [NO]................... .......stochastic_transformer[OKAY][NO] [OKAY]....... . transformer[OKAY][NO] transformer ............ .......stochastic_transformer[NO]............ [NO][OKAY]........ .......[OKAY][NO] [OKAY]....... [OKAY]stochastic_transformer stochastic_transformer . .[NO] [NO]....... .......[OKAY] [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-sciencedeepspeed info ...................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 DeepSpeed general environment info: ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................. [OKAY] .................. [OKAY] [OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name op name................op name ................................installed................ ..installed installed installed .. compatible.. ..compatible -------------------------------------------------- compatiblecompatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam ...............cpu_adam[YES] cpu_adam ...............[YES] ...........................[YES] [YES][OKAY][OKAY] ...... ...... [OKAY] [OKAY] fused_adamfused_adam .............fused_adam............. [NO] .............fused_adam [NO] .......[NO] ....................[OKAY] ....... [OKAY][NO] [OKAY]....... fused_lamb fused_lamb [OKAY] ............. .............fused_lamb[NO] fused_lamb[NO] .................... ............. ....... [OKAY][NO][NO] [OKAY] ....... .......[OKAY] [OKAY] sparse_attn ............sparse_attn sparse_attn [NO] ............sparse_attn ............................... [NO][OKAY][NO][NO] .....................transformer [OKAY][OKAY][OKAY] ............ [NO]transformer transformer transformer....... ............ ............ ............[OKAY] [NO][NO] [NO] ....... ....... ....... stochastic_transformer [OKAY] [OKAY][OKAY] . [NO] stochastic_transformer.......stochastic_transformerstochastic_transformer [OKAY]. .. [NO][NO][NO] ..................... [OKAY][OKAY][OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op nameop name op name ................................op name................ installed................installedinstalled ....installed .. compatible ..compatible compatible-------------------------------------------------- ----------------------------------------------------------------------------------------------------compatible -------------------------------------------------- cpu_adamcpu_adam cpu_adam ............... ............... ...............cpu_adam [YES] [YES] ...............[YES]............ [YES]......[OKAY][OKAY] [OKAY] ...... [OKAY] fused_adam fused_adam.............fused_adam ............. ............. fused_adam[NO][NO][NO] .................................. [OKAY] [NO] [OKAY][OKAY] .......fused_lamb fused_lamb [OKAY]fused_lamb............. ............. .............[NO] fused_lamb [NO][NO] ....... ............. ....... .......[OKAY] [NO][OKAY][OKAY] ....... [OKAY] sparse_attnsparse_attn sparse_attn ............ ............ sparse_attn............ [NO] [NO] [NO]............ ....... ....... ....... [OKAY][NO][OKAY][OKAY] .......transformer transformertransformer[OKAY]............ ............ ............[NO][NO]transformer ....... .......[NO]............ .......[OKAY][NO][OKAY] [OKAY]....... stochastic_transformer [OKAY]stochastic_transformer . stochastic_transformer [NO] . stochastic_transformer ........ [NO] [OKAY][NO]........ .......[OKAY][NO] [OKAY]....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1torch version ....................torch cuda version 1.8.1............... 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed info deepspeed wheel compiled w.................... ......0.4.2+bc17042, bc17042, big-science torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninja ninja...................................................... [OKAY] [OKAY][OKAY] .................. ----------------------------------------------------------------------------------------------------[OKAY]-------------------------------------------------- op name op name--------------------------------------------------op name................ ................ installedop name................installed ....................installed compatible compatibleinstalled..-------------------------------------------------- --------------------------------------------------..compatible compatible -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] [YES]cpu_adam...... ......cpu_adam............... [OKAY][OKAY] [YES]............... [YES]...... ......[OKAY] [OKAY]fused_adam fused_adam .......................... [NO][NO] .............. [OKAY][OKAY] fused_adam fused_adam............. fused_lambfused_lamb .......................... [NO] .............[NO] [NO].......[NO]....... [OKAY]..............[OKAY] [OKAY][OKAY] ninjaninjaninja ninja .................................... .................. ..................[OKAY][OKAY][OKAY] [OKAY] fused_lamb .............fused_lamb [NO]............. .......[NO] .......sparse_attn[OKAY] sparse_attn [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ............ ............[NO] [NO]....... .......[OKAY] [OKAY] op nameop name op nameop name ................ ................................................ installedinstalledinstalledinstalled .. .... ..compatiblecompatiblecompatible compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- transformertransformer sparse_attn ........................sparse_attn............ [NO] [NO]............[NO] [NO]..................... .......[OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------- transformerstochastic_transformer stochastic_transformer............transformer . [NO]. ............ [NO][NO]....... [NO] .............. [OKAY] .......[OKAY] cpu_adamcpu_adam ..............................cpu_adam cpu_adam [YES] ...............[YES] ...............[YES]............ ...... [YES][OKAY] [OKAY][OKAY] [OKAY][OKAY]...... [OKAY] stochastic_transformer stochastic_transformer. [NO] . .......[NO] .......[OKAY] fused_adam ............. [NO] .......fused_adam fused_adam[OKAY]fused_adam [OKAY] .......................... ............. [NO][NO] fused_lamb [NO]........................... [OKAY]....... [OKAY] [NO] [OKAY]....... [OKAY]fused_lambfused_lamb fused_lamb ....................................... [NO][NO][NO] ....... ....... ....... sparse_attn[OKAY] [OKAY] [OKAY] ............ [NO] ....... [OKAY] transformer ............ [NO]sparse_attn sparse_attnsparse_attn ....... ............ ........................[OKAY] [NO][NO][NO] .......stochastic_transformer .............. [OKAY] [OKAY].[OKAY] transformer[NO] ............transformer transformer....... [NO] ............ ...................[NO][OKAY] [NO] [OKAY]....... .......[OKAY] [OKAY] stochastic_transformer stochastic_transformer .stochastic_transformer . [NO] [NO]........ [OKAY].......[NO] [OKAY]....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] transformer_inference .. [NO] ....... transformer_inference[OKAY] .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES]quantizer .................... [OKAY][NO] ....... [OKAY]quantizer .............. [NO] --------------------------------------------------....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info:deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch install pathtorch 1.8, cuda 11.1 ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] quantizer .............. [NO] ....... [OKAY] torch version .................... 1.8.1 -------------------------------------------------- torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name op name op name................................................ ................installed installedinstalled.. installed....compatible .. compatible compatible-------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adamcpu_adam cpu_adam[YES].............................. ...... ...............[YES] [YES] [OKAY] [YES]............ ......[OKAY][OKAY] [OKAY] fused_adam ............. [NO]fused_adamfused_adam fused_adam............. ....... ............. .............[OKAY] [NO] [NO] [NO] ....... ....... ....... [OKAY] fused_lamb[OKAY][OKAY] .............fused_lamb fused_lamb [NO]fused_lamb ................................. ............. [NO][OKAY] [NO][NO] ..................... [OKAY] [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY]sparse_attn sparse_attn sparse_attn transformer............ ............ ............[NO] ............[NO] [NO] ....... .......[NO] [OKAY]....... .......[OKAY][OKAY] transformer [OKAY]............ transformer [NO]............ stochastic_transformer transformer....... [NO] . ............ [OKAY].......[NO] .......[NO] [OKAY] [OKAY] .......stochastic_transformer stochastic_transformer [OKAY] .. [NO]stochastic_transformer[NO] ............... [OKAY][OKAY][NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] .......utils [NO].................. [YES] ...... [OKAY] quantizer .............. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................................... .................. .................. [OKAY] [OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name ................................................................ installedinstalled installed .. installed.... .. compatiblecompatiblecompatiblecompatible -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cpu_adamcpu_adamcpu_adamcpu_adam ............... .............................. ............... [YES][YES] [YES][YES] ........................ [OKAY] [OKAY] [OKAY][OKAY] fused_adamfused_adam fused_adamfused_adam ............. ..........................[NO] ............. [NO].......[NO] .......[OKAY].......[NO] [OKAY] .......[OKAY] fused_lamb [OKAY] fused_lamb ............. .............fused_lamb[NO] [NO]fused_lamb.................... ....... .............[OKAY][NO] [OKAY][NO]....... .......[OKAY] [OKAY] sparse_attn sparse_attn............ ............[NO]sparse_attnsparse_attn ............[NO]................... .......[OKAY][NO][NO] [OKAY].............. transformer[OKAY][OKAY] transformer ............transformer............ transformer [NO] ............[NO] ............ ....... [NO]....... [NO] [OKAY] [OKAY] .............. [OKAY][OKAY] stochastic_transformerstochastic_transformer stochastic_transformer.stochastic_transformer. .[NO][NO] ........[NO]....... [NO][OKAY].......[OKAY] ....... [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]async_io async_io............... -------------------------------------------------- ............... [NO] [NO]....... .......[NO] [NO] transformer_inference ..transformer_inference [NO] ......... [NO][OKAY] ....... [OKAY] DeepSpeed general environment info: utils .................. utils[YES] ........................ [OKAY][YES] ...... [OKAY] DeepSpeed general environment info:torch install path ............... quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1 torch versiontorch cuda version ................................... 1.8.111.1 nvcc versiontorch cuda version .................................... 11.211.1 deepspeed install path nvcc version........... ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.2 deepspeed infodeepspeed install path .............................. 0.4.2+bc17042, bc17042, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. deepspeed info...... ...................torch 1.8, cuda 11.1 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninja .................................... [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- op name op name................ installed................ ..installed compatible ..-------------------------------------------------- compatible -------------------------------------------------- cpu_adam ............... cpu_adam[YES] ..................... [OKAY][YES] ...... [OKAY] fused_adam ............. fused_adam[NO] .................... [OKAY] [NO] ....... [OKAY]fused_lamb ............. [NO] fused_lamb....... [OKAY]............. [NO] ....... [OKAY] sparse_attn ............ [NO]sparse_attn ....... ............[OKAY] [NO] .......transformer [OKAY]............ [NO] transformer....... ............[OKAY] [NO] ....... stochastic_transformer[OKAY] . [NO] .......stochastic_transformer [OKAY] . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop nameop name ................ ................................ ................installed installed installedinstalled.... ..compatible .. compatible compatible--------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam[YES] cpu_adam ............... ............... .....................[YES] [YES][OKAY][YES]...... [OKAY]............ [OKAY][OKAY] fused_adam fused_adam.............fused_adam fused_adam............. [NO] ............. [NO]............. .......[NO] ....... [NO][OKAY]....... .......[OKAY][OKAY] [OKAY]fused_lamb .............fused_lambfused_lamb fused_lamb[NO].......................... ....................[NO][NO] [OKAY][NO]....... ....... [OKAY] ....... /bin/sh: line 0: type: git: not found [OKAY] [OKAY] sparse_attn ............sparse_attn [NO] sparse_attnsparse_attn................... [NO]............ [OKAY] ............ .......[NO] [NO][OKAY].......transformer ...................[OKAY] transformer [OKAY][NO] ...................transformertransformer [OKAY] [NO] ............ ............ ....... [NO] [NO] [OKAY] ....... ....... stochastic_transformer [OKAY] [OKAY]stochastic_transformer . [NO]stochastic_transformer. stochastic_transformer.......[NO] .[OKAY]....... . [NO][OKAY][NO] .............. [OKAY][OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja .................................... ..................[OKAY] .................. [OKAY][OKAY] --------------------------------------------------[OKAY]-------------------------------------------------- -------------------------------------------------- op name --------------------------------------------------................ op nameop name op name ................installed................ installed................installed.. .. ..installed compatiblecompatible compatible..---------------------------------------------------------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam ..............................cpu_adam cpu_adam[YES][YES]............... ...........................[YES] [YES][OKAY]...... [OKAY] ......[OKAY] [OKAY] fused_adamfused_adam ..........................fused_adam fused_adam[NO].............[NO] .................... [NO] ....... [NO][OKAY] ....... [OKAY].......[OKAY] [OKAY]fused_lamb .............fused_lamb fused_lamb fused_lamb[NO] .......................... ....................[NO][NO] [OKAY].............. [NO] [OKAY] [OKAY] ....... [OKAY] sparse_attn ............ [NO]sparse_attn .......sparse_attn ............ sparse_attn[OKAY] ............ ............[NO] [NO][NO] transformer.............. .......[OKAY][OKAY]............ [OKAY]transformer[NO] transformer................... transformer............[NO][OKAY] .......[NO]............ stochastic_transformer[NO][OKAY] ....... [OKAY]........ stochastic_transformer [OKAY] [NO] stochastic_transformer....... . stochastic_transformer .[OKAY] [NO] [NO]........ .......[NO][OKAY] [OKAY] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... transformer_inference[NO] .. [NO] ....... [OKAY] utilstransformer_inference .................... [YES][NO] ............. [OKAY][OKAY] quantizer .............. [NO]utils ......................... [OKAY][YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch cuda version ............... 11.1 nvcc version ..................... 11.2 async_io ............... [NO] ....... [NO] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science transformer_inference .. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name op name op name................ op name................................installed ................installed..installed installed..compatible.. ..compatiblecompatible-------------------------------------------------- compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adamcpu_adamcpu_adam[YES] ................................................... [YES][YES][YES][OKAY] .................. [OKAY][OKAY][OKAY] fused_adam ............. [NO]fused_adam fused_adamfused_adam ....... .......................... ............. [OKAY][NO][NO] [NO] ..............fused_lamb....... [OKAY][OKAY].............[OKAY] [NO] ....... fused_lambfused_lambfused_lamb [OKAY] ............. .......................... [NO][NO][NO] ..................... [OKAY][OKAY][OKAY] sparse_attn ............ [NO] .......sparse_attn [OKAY]sparse_attnsparse_attn............ ........................[NO] [NO]transformer [NO] ....... ................... ....... [OKAY] [OKAY] [OKAY][NO] transformer.......transformer transformer[OKAY]........................ ............ [NO] [NO] stochastic_transformer[NO] ....... ....... ....... [OKAY] .[OKAY] [OKAY] [NO] ....... stochastic_transformerstochastic_transformer[OKAY] stochastic_transformer .. . [NO] [NO] [NO] ....... .............. [OKAY][OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... utils[OKAY] .................. [YES] ......utils [OKAY].................. [YES] ...... quantizer[OKAY] .............. [NO] quantizer....... ..............[OKAY] [NO] ....... --------------------------------------------------[OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... [NO]............... [NO] .......async_io [NO] ............... [NO] ....... [NO]transformer_inference .. [NO] ....... [OKAY] transformer_inference .. transformer_inferenceutils[NO] ........................... [NO][YES] [OKAY]............. [OKAY][OKAY] utils ..................quantizer [YES]utils.............. ........................[NO] [OKAY][YES]....... ......[OKAY] [OKAY]quantizer .............. [NO]-------------------------------------------------- .......quantizer [OKAY].............. [NO] .......-------------------------------------------------- [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system /bin/sh: line 0: type: git: not found meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version ...............torch cuda version 11.1............... nvcc version11.1 .....................nvcc version 11.2..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ................... deepspeed info0.4.2+bc17042, bc17042, big-science ................... deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... torch 1.8, cuda 11.1deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op name op nameop name ................ ................................ ................ installedinstalledinstalledinstalled ........ compatible compatible compatible compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam..............................cpu_adam [YES]...............[YES]............... ......[YES]...... [YES] [OKAY] [OKAY]............ [OKAY][OKAY] fused_adamfused_adam .......................... fused_adamfused_adam[NO] [NO] .................... ............. ....... [NO][OKAY] [NO] .......[OKAY]....... [OKAY][OKAY]fused_lamb fused_lamb............. .............fused_lamb[NO] fused_lamb....... [NO] ............. .............[OKAY]....... [NO][NO][OKAY] .............. [OKAY][OKAY] sparse_attn ............ [NO]sparse_attn ................... sparse_attnsparse_attn[OKAY] [NO]........................ transformer.......[NO][NO] ............ [OKAY]....... .......[NO]transformer[OKAY] .......[OKAY]............ [OKAY]transformer [NO] .......transformer............ stochastic_transformer [NO] [OKAY] ............ ........ [NO][NO][OKAY] ..............stochastic_transformer [OKAY][OKAY]stochastic_transformer. [NO] ........stochastic_transformer [NO] [OKAY] ........ [NO][OKAY] ninjaninjaninjaninja .................. .................. .................................... [OKAY][OKAY] [OKAY] [OKAY] -------------------------------------------------- ....... [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op nameop name................op name installed................................................ installedinstalled..installed .. ....compatible compatiblecompatible--------------------------------------------------compatible ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adam cpu_adam...............cpu_adam cpu_adam...............[YES] ..............................[YES] ......[YES]......[YES] ......[OKAY]...... [OKAY] [OKAY] [OKAY] fused_adam fused_adam............. fused_adam.............fused_adam .............[NO] [NO]............. ....... [NO] ....... .......[OKAY][NO] [OKAY][OKAY]....... fused_lamb[OKAY] fused_lamb.............fused_lamb .............fused_lamb.............[NO] [NO][NO].................... ..............[NO] [OKAY] [OKAY] ....... [OKAY] [OKAY] sparse_attn ............ [NO]sparse_attn .......sparse_attnsparse_attn ............ ............[OKAY]............ [NO][NO] [NO] ..............transformer....... [OKAY]............[OKAY][OKAY] [NO] transformertransformer ....... ............transformer............ [OKAY] [NO]............ [NO]....... [NO][OKAY].......stochastic_transformer ....... .[OKAY][OKAY] [NO]stochastic_transformer ....... .[OKAY]stochastic_transformer stochastic_transformer [NO] ......... [OKAY][NO][NO] .............. [OKAY][OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info:DeepSpeed general environment info: deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info: deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science DeepSpeed general environment info: deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name op name................op name................ installed................installed ................ ..installed.. compatiblecompatibleinstalled.. --------------------------------------------------..--------------------------------------------------compatible deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 compatible-------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: cpu_adam cpu_adam............... ...............[YES] cpu_adam cpu_adam[YES]...... .............................. ...... [OKAY][YES][OKAY][YES] ............ [OKAY][OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] fused_adamfused_adam .......................... fused_adam[NO]fused_adam [NO]....... [OKAY] ............. torch version .................... 1.8.1 .................... fused_lamb[OKAY] [NO][NO] torch cuda version ............... 11.1 ............. ....... .......fused_lamb [NO] [OKAY][OKAY].................... [OKAY][NO]fused_lamb nvcc version ..................... 11.2 fused_lamb.................... .............[NO][OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] [NO]....... .......[OKAY] [OKAY]sparse_attn ............ [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science sparse_attn transformer............ sparse_attn............[NO]sparse_attn ............[NO]....... ............[NO] [OKAY]....... [NO] ....... [OKAY] transformer[OKAY]....... deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ............[OKAY]stochastic_transformertransformer [NO]. ...................transformer ............[NO] [NO][OKAY] [NO] ....... ..............[OKAY] stochastic_transformer[OKAY][OKAY] . [NO]stochastic_transformer stochastic_transformer....... .[OKAY]. [NO][NO] .............. [OKAY][OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] transformer_inference utils.. ..................[NO] [YES]....... ......[OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. [OKAY] quantizer .............. utils[NO] ......................... [YES][OKAY] --------------------------------------------------JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja ...... [OKAY] -------------------------------------------------- JIT compiled ops requires ninja quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................................... .................. ..................[OKAY][OKAY] [OKAY][OKAY]-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name................op nameop name ................installed................................ installed ..installed installed ..compatible.... compatible--------------------------------------------------compatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adam ............... [YES]cpu_adam cpu_adamcpu_adam...... ...............[OKAY]............... ............... [YES] [YES] [YES] ...... ...... ...... [OKAY] [OKAY] [OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY]fused_adam fused_adam .......................... [NO]fused_lamb.............[NO] ....... .............[NO].......[OKAY] [NO][OKAY] .......fused_lamb....... [OKAY].............[OKAY]fused_lamb [NO]............. [NO]fused_lamb....... ....................[OKAY] [OKAY][NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attnsparse_attn transformer........................ ............ sparse_attn[NO] [NO] [NO] ................... ....... ....... [NO][OKAY][OKAY] .......[OKAY] transformertransformer[OKAY] ........................stochastic_transformer transformer [NO][NO]............. ..............[NO][NO] [OKAY].......[OKAY]....... [OKAY][OKAY] stochastic_transformer stochastic_transformer .. stochastic_transformer [NO] [NO] .............. . [OKAY] [OKAY] [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. **** Git info for Megatron: git_hash=unknown git_branch=unknown **** async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] DeepSpeed general environment info: quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_io ............... [NO] ....... [NO] torch version .................... 1.8.1 transformer_inference .. [NO] ....... [OKAY] torch cuda version ............... 11.1 utils .................. [YES] ...... [OKAY] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] quantizer .............. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... transformer_inference[NO] ......... [NO][NO] ....... [OKAY] utils .................. [YES]transformer_inference ........ [OKAY][NO] ....... [OKAY]quantizer .............. [NO] ....... [OKAY]utils .................. [YES]-------------------------------------------------- ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja ...................................................... [OKAY][OKAY][OKAY] .................. -------------------------------------------------- ---------------------------------------------------------------------------------------------------- [OKAY]op name op name op name ................ ................--------------------------------------------------................ installed installedinstalled .. .. ..op name compatiblecompatible compatible ----------------------------------------------------------------------------------------------------................ --------------------------------------------------installed .. compatible cpu_adam-------------------------------------------------- cpu_adam ...............cpu_adam ...............[YES]............... [YES]......[YES] ......[OKAY]...... [OKAY]cpu_adam[OKAY] ............... [YES] ...... [OKAY]fused_adam ............. fused_adam[NO] fused_adam ............. ....... ............. [NO] [OKAY] [NO] ....... .......fused_adam[OKAY]fused_lamb [OKAY].......................... fused_lamb [NO]fused_lamb[NO] ............. ....... ....................[NO][OKAY] [NO] [OKAY].............. [OKAY][OKAY] fused_lamb ............. [NO] ....... sparse_attn[OKAY] ............ [NO] ....... sparse_attnsparse_attn[OKAY] ........................ [NO]transformer[NO] .......................... sparse_attn [OKAY][OKAY] [NO] transformer................... ............transformer[OKAY][NO] [NO]................... stochastic_transformer .......[NO][OKAY] [OKAY]........ [OKAY][NO]transformer stochastic_transformer................... stochastic_transformer [OKAY].[NO] . [NO] [NO].............. ....... [OKAY][OKAY] [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------op nameop name ................op name................op name installed ................installed ................ .. .. installedinstalledcompatible compatible .. ..-------------------------------------------------- -------------------------------------------------- compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam [YES]............... cpu_adamcpu_adam......[YES] ....................................[OKAY] [YES][OKAY][YES] ............ [OKAY][OKAY] fused_adam ............. [NO] fused_adam....... .............[OKAY] [NO]fused_adamfused_adam fused_lamb.................... [OKAY].............[NO] ............. [NO]fused_lamb.......[NO] ........................... [OKAY] [NO] [OKAY] [OKAY] ....... [OKAY]fused_lamb fused_lamb............. .............[NO] [NO]....... .......sparse_attn[OKAY] [OKAY]............ sparse_attn[NO] ................... [NO][OKAY] ....... [OKAY] transformersparse_attn transformer............sparse_attn [NO] ........................ ............ .......[NO][NO] .......[OKAY][NO] ....... [OKAY]....... [OKAY]stochastic_transformer [OKAY]stochastic_transformer .transformer . [NO]transformer ............ [NO] ................... [NO]....... [OKAY] [NO] [OKAY] .............. [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO] [NO]....... .......[OKAY] [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- async_io ............... [NO] ....... [NO] DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** op nameop nameop name op name ................ ................................ ................ installed installedinstalled installed .. .... .. compatible compatible compatiblecompatible -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam cpu_adamcpu_adam............... .............................................[YES] [YES]......[YES][YES] ......[OKAY]............ [OKAY] [OKAY] [OKAY] fused_adam .............fused_adam fused_adam fused_adam[NO] ............. ............. ............. .......[NO] [NO] [NO] [OKAY] .............. ....... [OKAY][OKAY][OKAY] fused_lamb .............fused_lamb fused_lambfused_lamb [NO] ............. .......................... ....... [NO] [NO][NO] [OKAY] ....... .............. [OKAY][OKAY][OKAY] sparse_attn ............ [NO]sparse_attn sparse_attnsparse_attn....... ........................[OKAY]............ DeepSpeed general environment info: [NO][NO][NO] .....................transformer [OKAY][OKAY][OKAY]............ torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] [NO] transformer.......transformer transformer ............ [OKAY] ........................ [NO] [NO][NO]....... ..............stochastic_transformer[OKAY] torch version .................... 1.8.1 [OKAY][OKAY] . [NO] .......stochastic_transformerstochastic_transformerstochastic_transformer [OKAY] torch cuda version ............... 11.1 ... [NO][NO] [NO] .............. .......[OKAY][OKAY] [OKAY] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja .................................... .................................... [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op nameop name op nameop name................................ ................ installed ................ installed ..installedinstalled .. .. ..compatible compatible compatible ------------------------------------------------------------------------------------------------------------------------------------------------------compatible -------------------------------------------------- cpu_adamcpu_adam cpu_adam ...............cpu_adam ............... [YES].............................. ...... [YES] [YES][OKAY][YES] .................. [OKAY] [OKAY] [OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY]fused_adamfused_adam ............. ............. ............. fused_lamb[NO] .............[NO] [NO][NO] ....... ....... .......[OKAY] ....... [OKAY][OKAY]fused_lamb [OKAY] ............. [NO] .......fused_lambfused_lamb [OKAY].......................... sparse_attn[NO][NO] ................... .......[NO][OKAY] .......[OKAY]sparse_attn [OKAY] ............ [NO] transformer....... ............sparse_attn[OKAY] [NO]............sparse_attn transformer....... [NO] ............[OKAY] ............ ....... [NO] stochastic_transformer[NO] [OKAY] ....... ........ transformer[OKAY][NO][OKAY] .......  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [OKAY]............ async_io ............... [NO] ....... [NO] transformer [NO] stochastic_transformer................... [NO].[OKAY] .......[NO] [OKAY]....... [OKAY] transformer_inference .. [NO] ....... [OKAY] stochastic_transformer stochastic_transformer . .[NO] [NO]....... .......[OKAY] [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 quantizer .............. [NO] ....... [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaJIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja .................. ......................................................[OKAY] [OKAY][OKAY][OKAY]-------------------------------------------------- --------------------------------------------------op name---------------------------------------------------------------------------------------------------- ................op name op nameop nameinstalled................ installed.................. ................ ..compatible installed --------------------------------------------------compatible..installed compatible--------------------------------------------------.. --------------------------------------------------compatible DeepSpeed general environment info: cpu_adam ...............-------------------------------------------------- cpu_adam[YES] cpu_adam..................... ...............[YES][OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 [YES]...... ......[OKAY]cpu_adam ............... [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science fused_adam[YES] ............. ......[NO] fused_adam [OKAY] fused_adam....... deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ............. .............[NO][OKAY] [NO]....... .......[OKAY]fused_lamb [OKAY]............. fused_adam[NO]fused_lambfused_lamb ....... ....................................... [OKAY][NO][NO][NO] ..................... [OKAY] [OKAY] [OKAY] sparse_attn fused_lamb............ .............[NO] [NO]....... sparse_attnsparse_attn .......[OKAY] ............ ............ transformer[NO][NO][OKAY] ............ .............. [OKAY][NO][OKAY] ....... [OKAY]transformertransformer ........................ [NO][NO]stochastic_transformer .......sparse_attn....... . [OKAY] ............ [OKAY][NO] .......[NO]stochastic_transformer [OKAY]stochastic_transformer ........ . [OKAY] [NO] [NO] .............. transformer [OKAY] [OKAY] ............ [NO] ....... [OKAY] ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY] stochastic_transformer . [NO] ....... [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name op name op name................................................ ................installedinstalledinstalled installed...... compatible..compatiblecompatible compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adamcpu_adam cpu_adam............................................. ...............[YES][YES][YES] ......[YES]............ ......[OKAY][OKAY] [OKAY] [OKAY] fused_adamfused_adam fused_adamfused_adam ............. ............. [NO].............[NO]............. .......[NO][NO]....... [OKAY].......[OKAY] ....... [OKAY] [OKAY]fused_lamb fused_lamb .............fused_lamb............. [NO]fused_lamb[NO]............. ...........................[NO] [NO][OKAY][OKAY]....... .......[OKAY] [OKAY] sparse_attnsparse_attn ........................sparse_attn sparse_attn[NO][NO]............ ...................[NO]....... [OKAY][NO][OKAY]....... .......[OKAY] transformertransformer[OKAY] ........................transformer [NO]transformer[NO]............ ...................[NO]....... [NO][OKAY].......[OKAY] .......[OKAY] [OKAY] stochastic_transformerstochastic_transformer stochastic_transformerstochastic_transformer.. [NO][NO]. ...............[NO] [NO][OKAY][OKAY]....... .......[OKAY] [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op nameop name................ op name................ installed ................................ installed .. ..installed installed compatible compatible .... -------------------------------------------------- --------------------------------------------------compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam [YES]cpu_adam ............... ............... .....................[YES] [YES][OKAY]......[YES] [OKAY]............ [OKAY][OKAY] fused_adam ............. fused_adam[NO]fused_adam fused_adam ................................. [NO][NO] [OKAY] ....... ............. ....... [OKAY] fused_lamb [NO][OKAY] ............. fused_lamb.......[NO] fused_lamb ............. .......[OKAY] ............. [NO] [OKAY] [NO]fused_lamb....... ....................[OKAY] [NO] .......[OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformersparse_attnsparse_attn sparse_attn............ ............ ............[NO]............[NO] ..............[NO] [NO] [OKAY][OKAY].............. [OKAY][OKAY] transformer stochastic_transformer............ transformertransformer [NO] . ............ ...................[NO] [OKAY].......[NO][NO] [OKAY] ....... ....... stochastic_transformer[OKAY][OKAY] . [NO] .......stochastic_transformerstochastic_transformer [OKAY] .. [NO][NO] .............. [OKAY][OKAY] ninjaninjaninjaninja .................................... .................................... [OKAY][OKAY] [OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop name op name ................ ................ ................................ installedinstalled installed ....installed .. compatiblecompatible .. compatible--------------------------------------------------compatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam cpu_adam...............cpu_adam............... ...............[YES] [YES] ...............[YES]...... ......[OKAY]......[YES] [OKAY][OKAY]...... [OKAY] fused_adam .............fused_adam fused_adam[NO]fused_adam ............. .......................... [NO]....... [NO][OKAY][NO] ....... ..............fused_lamb[OKAY] .............[OKAY][OKAY] fused_lamb[NO] fused_lamb.............fused_lamb....... [NO] [OKAY].......................... ....... [NO][NO][OKAY] .............. [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY]sparse_attn ............ sparse_attnsparse_attn [NO]transformer ........................................... [NO] [NO][OKAY][NO] .............. transformer[OKAY].......[OKAY] ............[OKAY] transformerstochastic_transformer [NO] .............transformer....... [NO] [NO][OKAY] ............ ....... [NO].......stochastic_transformer[OKAY] .......[OKAY] .[OKAY] [NO]stochastic_transformer .......stochastic_transformer . [OKAY] . [NO] [NO]....... .......[OKAY] [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... ..................[OKAY][OKAY][OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- op name --------------------------------------------------op name op name................ op name................................installed ..................installedinstalled installed.. compatible.. compatible DeepSpeed general environment info: compatible ..-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- compatible torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 cpu_adamcpu_adamcpu_adam cpu_adam ............... .............................. ............... [YES][YES] [YES] [YES] ............ [OKAY][OKAY]...... ...... [OKAY] [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] fused_adam fused_adam.............fused_adam fused_adam[NO]............. .................... ............. [NO][NO] [NO] [OKAY] ..................... deepspeed info ................... 0.4.2+bc17042, bc17042, big-science [OKAY][OKAY][OKAY]fused_lamb deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ............. fused_lambfused_lambfused_lamb[NO] .............................................. [NO] [NO][OKAY][NO]....... .......[OKAY]....... [OKAY] [OKAY] /bin/sh: line 0: type: git: not found sparse_attn ............ [NO] ....... [OKAY] sparse_attn sparse_attnsparse_attn............transformer ............[NO] ............[NO]................... [NO][NO][OKAY]....... ..............[OKAY] transformer [OKAY][OKAY]............ /bin/sh: line 0: type: git: not found [NO]transformer transformerstochastic_transformer................... [OKAY]............[NO] . [NO]....... [NO]stochastic_transformer....... [OKAY].......[OKAY] . [OKAY] [NO]stochastic_transformer stochastic_transformer....... .[OKAY] . [NO] [NO]....... .......[OKAY] [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY]-------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name-------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ninjaninjaninja ninja .................................... .................. .................. [OKAY][OKAY] [OKAY][OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name op nameop name................ op name ................ ................installed ................ installedinstalledinstalled .. .. .. .. compatiblecompatible compatible compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch cuda version ............... 11.1 /bin/sh: line 0: type: git: not found op name op name op name................ ................ ................ installed ................installed installed .. ..installed .. compatible ..compatible compatible -------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- cpu_adam cpu_adamcpu_adam ............... ...............cpu_adam ............... [YES] [YES][YES]..................... ......[YES]......[OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 cpu_adam cpu_adam............... cpu_adam[YES]............... cpu_adam ...............[YES]...... ............... [OKAY][YES]......[YES] ......[OKAY]...... [OKAY]...... [OKAY] [OKAY] [OKAY] [OKAY] fused_adam ............. fused_adam[NO] fused_adam.................... fused_adam.............[OKAY][NO] ............. **** Git info for Megatron: git_hash=unknown git_branch=unknown **** fused_adam ............. [NO] fused_adam....... .............[OKAY]fused_adam fused_adam [NO]....... [NO].......fused_lamb [OKAY] ............. .......[OKAY] [NO][OKAY]fused_lamb .............[NO] .............fused_lamb....... [NO] [NO] [OKAY]............. ....... ....... .............[OKAY] fused_lamb[NO] .......[NO] fused_lamb [OKAY][OKAY]....... .............[OKAY] fused_lamb ................................. [NO][OKAY][NO] ....... .......[OKAY] [OKAY] [NO]fused_lamb fused_lamb.................... .............[OKAY][NO] [NO].......sparse_attn .......[OKAY]............ sparse_attn ............ [NO] sparse_attn....... ............[OKAY] [OKAY][NO] ....... [OKAY] [NO] ....... sparse_attn[OKAY] transformer............ sparse_attn transformer............ ............[NO] [NO]....... sparse_attn.......[OKAY]sparse_attn ............[OKAY]............ transformer sparse_attn............[NO] ............[NO].......transformer ............[NO].......[OKAY] [NO][OKAY]....... .......transformer[OKAY] [NO] [NO] ............ .......stochastic_transformer ....... [OKAY] [NO] .[OKAY]....... [NO][OKAY] stochastic_transformer[OKAY]............ [NO]transformer. .......stochastic_transformer............[NO] [OKAY][NO]........ transformertransformer....... ............[OKAY] stochastic_transformer............[NO] [NO]........ .......[OKAY][NO] [NO].......[OKAY] .......[OKAY]stochastic_transformer [OKAY] [OKAY]....... [OKAY] . [NO]stochastic_transformer ....... [OKAY]. stochastic_transformer stochastic_transformer. [NO] ........ [NO][OKAY] [NO] ....... [OKAY] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name op name op name ................op name ................ ................installed................ installed installed installed ...... ..compatiblecompatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------compatible -------------------------------------------------- cpu_adamcpu_adam cpu_adam ..............................cpu_adam............... [YES] [YES][YES] ............... .................. [YES] [OKAY][OKAY] [OKAY]...... [OKAY] fused_adamfused_adamfused_adam .............fused_adam ............. [NO] ............. .............[NO]....... [NO].......[OKAY][NO] ....... [OKAY] ....... [OKAY] fused_lamb [OKAY] fused_lamb.............fused_lamb fused_lamb [NO]....................................... .......[NO][NO][NO] .............. [OKAY]....... [OKAY] [OKAY][OKAY] sparse_attnsparse_attnsparse_attn ............sparse_attn............ ........................ [NO][NO][NO][NO] ............................ [OKAY][OKAY][OKAY] [OKAY] transformer ............transformertransformer transformer [NO]............ ............ [NO]............ ....... [NO]....... [OKAY][NO] [OKAY] ....... ....... [OKAY]stochastic_transformerstochastic_transformer[OKAY] .. stochastic_transformer[NO]stochastic_transformer[NO] ............... .[OKAY][NO][OKAY] [NO]....... [OKAY] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................. .................. .................................... [OKAY] [OKAY] [OKAY][OKAY]-------------------------------------------------- ----------------------------------------------------------------------------------------------------op name-------------------------------------------------- op nameop name................ op name ................................installed ................ installed..installed installed.. compatible ....compatible compatiblecompatible-------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam cpu_adam...............cpu_adam............... ...............[YES]...............[YES] ............[YES] [YES] [OKAY] [OKAY]............ [OKAY][OKAY] fused_adamfused_adam fused_adam..........................fused_adam [NO] [NO].......................... [NO] .............. [NO] [OKAY][OKAY]....... .......[OKAY] fused_lamb[OKAY] fused_lamb............. fused_lamb fused_lamb[NO].......................... .......[NO].............[NO] [OKAY]....... [NO][OKAY]....... .......[OKAY] [OKAY] sparse_attnsparse_attn ............sparse_attn............ sparse_attn [NO][NO] ............ .......................... [OKAY][NO][NO][OKAY] .............. transformer[OKAY]transformer[OKAY] ........................ transformer [NO]transformer [NO] ................... ............ .......[OKAY] [NO] [NO][OKAY] ..............stochastic_transformer [OKAY]stochastic_transformer[OKAY] . .[NO] stochastic_transformer[NO]stochastic_transformer ....... ........ . [OKAY] [NO][OKAY] [NO]....... .......[OKAY] [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc versionDeepSpeed general environment info: .......................................... 11.211.2 deepspeed install pathdeepspeed install path torch install path...................... ...............['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... torch version ...... torch 1.8, cuda 11.1 .................... torch 1.8, cuda 11.1 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found ninjaninjaninja ninja ...................................................... [OKAY] [OKAY].................. [OKAY]--------------------------------------------------[OKAY]-------------------------------------------------- op name-------------------------------------------------- op name --------------------------------------------------................ op name **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ................installed ................installedop name .. installed.. ................ compatible..compatible installed----------------------------------------------------------------------------------------------------compatible ..-------------------------------------------------- compatible cpu_adam-------------------------------------------------- cpu_adam ............... ...............cpu_adam[YES] [YES]..................... ......[YES][OKAY] [OKAY]cpu_adam...... [OKAY]............... [YES] fused_adam...... .............fused_adam[OKAY] fused_adam[NO]............. ....................[NO] [NO].......[OKAY] ....... [OKAY] [OKAY] fused_lambfused_lambfused_lamb fused_adam .......................... .............[NO].............[NO] [NO]....... ..............[NO][OKAY] [OKAY] [OKAY] ....... [OKAY] fused_lamb ............. sparse_attn[NO] sparse_attn............sparse_attn ............[NO]................... [NO]....... [NO] [OKAY] [OKAY].............. [OKAY][OKAY] transformer transformer............transformer ............[NO]............ .......[NO][NO] [OKAY].............. sparse_attn[OKAY][OKAY] stochastic_transformer............ stochastic_transformerstochastic_transformer [NO]. . .[NO] ....... [NO][NO] .......[OKAY].............. [OKAY][OKAY][OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ...................DeepSpeed general environment info: DeepSpeed general environment info:0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version .....................nvcc version 11.2..................... 11.2deepspeed install path ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io async_io............... [NO]............... [NO]....... .......[NO] [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] torch version .................... 1.8.1 ....... [OKAY] torch cuda version ............... 11.1 utils .................. utils[YES] ........................ [OKAY][YES] nvcc version ..................... 11.2 ...... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io ............... [NO] ....... [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 transformer_inference .. [NO] ....... [OKAY] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 async_io ............... [NO] ....... [NO] torch cuda version ............... 11.1 nvcc version ..................... 11.2 transformer_inference .. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] utils .................. [YES] ...... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] utils async_io.................. ...............[YES] [NO]...... .......[OKAY] [NO] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch cuda version ............... 11.1 async_io ............... [NO] ....... [NO] nvcc version ..................... 11.2 transformer_inference .. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science utils .................. [YES] ...... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_io ............... [NO] ....... [NO] torch version .................... 1.8.1 transformer_inference .. [NO] ....... [OKAY] torch cuda version ............... 11.1 utils .................. [YES] ...... [OKAY] nvcc version ..................... 11.2 quantizer .............. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninja .................. ..................[OKAY] [OKAY]-------------------------------------------------- --------------------------------------------------op name ................op name installed .................. compatibleinstalled --------------------------------------------------.. compatible -------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... ...............[OKAY] [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY]fused_adam ............. fused_lamb[NO] ............. .......[NO] [OKAY]....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............transformer [NO]............ [NO]....... .......[OKAY] [OKAY] transformer stochastic_transformer............ .[NO] [NO]....... ....... [OKAY][OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY] [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------op name ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name op name................ ................ ................ ................installed installed installed ..installed .. ..compatible .. compatiblecompatible op nameop name --------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name................op name ................ ................................installedinstalled .. installedinstalled.. compatiblecompatible .... -------------------------------------------------- compatible--------------------------------------------------compatible cpu_adamcpu_adam .............................. cpu_adamcpu_adam[YES] [YES].................................... ......[YES][YES][OKAY] -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES]cpu_adam ......cpu_adam............... cpu_adam...............[YES][OKAY] ......[OKAY]...... [OKAY] [OKAY] ...............[YES]...... ......[YES][OKAY] [OKAY]...... fused_adam ............. fused_adam[NO] .................... fused_adam[NO] [OKAY]fused_adam ............. ....... ............. [NO] fused_lamb[NO][OKAY] ....... fused_adam[OKAY] ............. ....... [OKAY] fused_lamb [NO] [OKAY].................... ............. [NO] ....... fused_adam[OKAY] [NO]fused_lamb[OKAY] ....... .............fused_lamb [OKAY][NO]............. ............. fused_adam[NO]fused_lamb fused_adam................................. [NO].............[NO][OKAY] .......[NO] [OKAY] ....... [OKAY] ..............[NO] [OKAY][OKAY] fused_lamb ....... .............[OKAY] fused_lamb[NO] sparse_attn ............ [NO] sparse_attn....... ............[OKAY] .................... fused_lamb [NO] sparse_attn.................... [OKAY]............ [NO] .......sparse_attn transformer[OKAY]............ [NO][OKAY][NO] ....... .......[OKAY] [OKAY] sparse_attn............[NO] transformer ............ [NO]................... [NO] .......[OKAY][NO] [OKAY].............. sparse_attn ............transformer [NO]............ .......[NO] sparse_attn[OKAY].......sparse_attn transformer[OKAY][OKAY] stochastic_transformer ............ [NO] transformer........ stochastic_transformer [NO][OKAY] ............ ............[OKAY]transformer............ ........[NO] [NO][OKAY]stochastic_transformer ....... [NO] ............ [NO]stochastic_transformer....... [NO][OKAY]........ ....... [OKAY][OKAY]. ....... [NO][OKAY]transformer [OKAY]................... [NO] .......stochastic_transformer [OKAY] . [NO] ....... [OKAY] transformer[OKAY] [NO]stochastic_transformer............ ....... [NO].[OKAY] .......[NO] .......[OKAY]stochastic_transformer [OKAY] . stochastic_transformer[NO] ....... .[OKAY] [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO]............... .......[NO] [NO]....... [NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... [OKAY] quantizer .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- nvcc version ..................... 11.2 JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found ninjaninjaninjaninja .................. .................. .................................... [OKAY][OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name --------------------------------------------------op name **** Git info for Megatron: git_hash=unknown git_branch=unknown **** op name ................ ................ op name................ installed installedinstalled.................. ..compatible .. installed compatible compatible-------------------------------------------------- .. -------------------------------------------------- --------------------------------------------------compatible -------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] cpu_adam[YES] ...... cpu_adam............... ......[OKAY][YES] ...............[OKAY]...... [YES][OKAY] ...... [OKAY] fused_adam .............fused_adam [NO]............. .......[NO] fused_adam [OKAY] fused_adam....... ............. [OKAY].............fused_lamb[NO] [NO] ...........................fused_lamb [NO][OKAY][OKAY] ............. .......[NO] [OKAY]fused_lamb.......fused_lamb ............. [OKAY][NO]............. .......[NO] [OKAY] ....... [OKAY]sparse_attn ............ [NO] sparse_attn....... ............[OKAY] [NO] sparse_attn.......transformer ............sparse_attn............[OKAY] [NO]............[NO]transformer ..............[NO] ............ [OKAY][OKAY] ....... [NO] [OKAY]....... transformer stochastic_transformer [OKAY]transformer ............ .............[NO] stochastic_transformer [NO][NO] ....... ....... ....... . [OKAY][OKAY] [OKAY] [NO] ....... stochastic_transformer[OKAY] stochastic_transformer . .[NO] [NO]....... [OKAY]....... [OKAY] ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ......quantizer [OKAY].............. [NO] ....... [OKAY] quantizer .............. [NO]-------------------------------------------------- ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop name op nameop name ................ ................................ ................ installed installedinstalled installed.. ......compatible compatible compatible compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adamcpu_adam ............................................. [YES][YES][YES] cpu_adam ............ ...... ...............[OKAY][OKAY][OKAY] [YES] ...... [OKAY] fused_adamfused_adam fused_adam ............. ............. ............. [NO] [NO] [NO]....... ..............[OKAY] [OKAY][OKAY] fused_adam fused_lambfused_lamb fused_lamb.......................... ............. ............. [NO][NO][NO] .......[NO].............. .......[OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja fused_lamb ............. [NO] ....... sparse_attnsparse_attnsparse_attn[OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. .................................... [NO] [NO] [NO] ....... ....... ....... [OKAY] [OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja transformertransformer transformer .................................... [NO][NO][NO] ..................... [OKAY][OKAY][OKAY] sparse_attn stochastic_transformerstochastic_transformer stochastic_transformer .............. . [NO] [NO][NO] [NO] ....... ..................... [OKAY][OKAY][OKAY] [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop name op name ................op name................ installed................installed................ ..installed..installed compatible..compatible .. --------------------------------------------------compatible -------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam cpu_adam.............................. ..............................[YES][YES] [YES][YES]............ ...... ......[OKAY] [OKAY] [OKAY][OKAY] fused_adamfused_adam fused_adam.......................... fused_adam [NO] .............[NO] ............. .............. [NO] [NO] [OKAY][OKAY] ....... ....... fused_lamb[OKAY]fused_lamb[OKAY] ............. ............. [NO]fused_lamb [NO]fused_lamb ....... ................................. [OKAY] [OKAY][NO] [NO] .............. [OKAY][OKAY] sparse_attn ............ [NO]sparse_attn ................... sparse_attnsparse_attn[OKAY][NO] ............................... [NO]transformer[OKAY][NO] ............ ..............transformer [NO] [OKAY] ............ ....... [OKAY] [NO]transformer [OKAY] .......transformer............ [OKAY] stochastic_transformer ............ [NO] .[NO]....... stochastic_transformer[NO]....... [OKAY]........[OKAY] [NO][OKAY] stochastic_transformer....... stochastic_transformer[OKAY] . .[NO] [NO]....... .......[OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] utilsasync_io .................. ...............[YES] [NO]...... .......[OKAY] [NO] quantizer .............. [NO] ....... [OKAY] transformer_inference ..-------------------------------------------------- [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................................... ....................................[OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op name................op nameop name ................................installed................ installed installedinstalled.. ......compatible -------------------------------------------------- compatiblecompatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adam ...............cpu_adamcpu_adam ...............[YES]............... cpu_adam [YES]......[YES] ......[OKAY]..................... [OKAY] [OKAY] [YES] fused_adam...... fused_adam.............[OKAY] .............fused_adam[NO] [NO]............. ....... ....... [NO] [OKAY] [OKAY]....... fused_lamb[OKAY] fused_lamb ............. .............fused_lamb[NO] [NO]............. ....... fused_adam.......[NO][OKAY] ............. ....... [OKAY] [NO][OKAY] ....... [OKAY] sparse_attn ............ fused_lamb[NO] sparse_attn.................... ............[OKAY] sparse_attn [NO] [NO] ............transformer .............. [NO] ............ [OKAY] ....... [NO] [OKAY] [OKAY] transformer.......transformer ............[OKAY]............ [NO][NO] .......stochastic_transformer ....... [OKAY] [OKAY]. [NO]stochastic_transformer sparse_attn .......stochastic_transformer .[OKAY] .[NO] ............ ....... [NO] [OKAY] .......[NO] [OKAY]....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninjaninjaninja .................................... .................................... [OKAY][OKAY] [OKAY] --------------------------------------------------[OKAY] -------------------------------------------------- --------------------------------------------------op name-------------------------------------------------- op name ................ op name................ op name installed ................ installed................ .. installed.. installed .. compatible .. compatible --------------------------------------------------compatible -------------------------------------------------- --------------------------------------------------compatible cpu_adam ............... [YES]cpu_adam -------------------------------------------------- cpu_adam...... ............... ...............[YES][OKAY] ...... [YES] [OKAY] ......cpu_adam [OKAY] fused_adam ............. ...............fused_adam [NO] .............fused_adam....... [YES].............[NO] [OKAY] ....... [NO] ......[OKAY] fused_lamb .......[OKAY] .............fused_lamb [NO][OKAY]............. .......[NO] fused_lamb[OKAY]....... ............. [OKAY]fused_adam [NO]............. .......[NO] [OKAY] sparse_attn ................... sparse_attn [OKAY][NO]............ sparse_attn.......[NO] ...................[OKAY] [OKAY]fused_lamb[NO] transformer....... transformer ............ [OKAY] .........................[NO] [NO]transformer....... ...................[NO][OKAY] [NO][OKAY] ..............stochastic_transformer [OKAY]stochastic_transformer .[OKAY] .[NO] stochastic_transformer [NO] .............. .[OKAY][OKAY] [NO] ....... sparse_attn[OKAY] ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... transformer_inference[OKAY] .. [NO] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. /bin/sh: line 0: type: git: not found async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer .............. [NO] quantizer....... ..............[OKAY] [NO] ....... --------------------------------------------------[OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] ...... [OKAY]quantizer .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY]-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] ninjaninjaninjaninja .................. ......................................................[OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------op name utils .................. [YES] ...... [OKAY] ................op nameop name op name installed ................ ..................................installed installedinstalledcompatible.. quantizer .............. [NO] ....... [OKAY] ..compatible..-------------------------------------------------- --------------------------------------------------compatiblecompatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES] ..................... [YES][OKAY]cpu_adam cpu_adam..................... ...............[YES][OKAY] [YES]...... ......fused_adam[OKAY] [OKAY]............. fused_adam[NO] .................... [NO][OKAY] ....... [OKAY] fused_adamfused_lamb fused_lamb fused_adam............. .......................... .............[NO] [NO] [NO][NO] ....... ....... ....... [OKAY] .......[OKAY] [OKAY] [OKAY] fused_lamb .............fused_lamb [NO]............. .......[NO] sparse_attn[OKAY]sparse_attn....... ........................[OKAY] [NO][NO] .............. [OKAY][OKAY] transformer transformer............ sparse_attn............ [NO]............[NO]sparse_attn .......[NO]................... [OKAY].......[OKAY][NO] [OKAY]....... stochastic_transformerstochastic_transformer[OKAY] transformer.. transformer............[NO][NO] ..........................[NO] [OKAY] [OKAY][NO] ....... .......[OKAY] [OKAY] stochastic_transformer stochastic_transformer . [NO]. .......[NO] [OKAY]....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... utils[OKAY] .................. [YES] ......utils [OKAY].................. [YES] ......quantizer [OKAY].............. [NO] ....... quantizer[OKAY] .............. [NO] .......-------------------------------------------------- [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1torch version -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- ....................torch cuda version 1.8.1............... 11.1torch cuda version JIT compiled ops requires ninja ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. nvcc version............... .....................11.1 11.2nvcc version JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja deepspeed install path..................... ...........11.2 deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed info ...................deepspeed wheel compiled w. ...... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch cuda version ............... 11.1 async_io ............... [NO] ....... [NO] nvcc version ..................... 11.2  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] transformer_inference .. [NO]async_io ...................... [NO][OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ....... [NO] utils .................. [YES] transformer_inference...... .. [OKAY][NO] ....... [OKAY] quantizerutils ................................ [YES][NO] ............. [OKAY] [OKAY] quantizer .............. [NO] --------------------------------------------------....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch cuda version ............... 11.1 nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................................................................ installedinstalledinstalledinstalled ........ compatiblecompatiblecompatiblecompatible -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cpu_adamcpu_adamcpu_adamcpu_adam ............................................................ [YES][YES][YES][YES] ........................ [OKAY] [OKAY][OKAY][OKAY] fused_adam .............fused_adam fused_adamfused_adam [NO] ............. ............. .................... [NO] [NO] [NO][OKAY] ....... ..............[OKAY] [OKAY][OKAY] fused_lamb .............fused_lamb fused_lamb [NO]fused_lamb............. .................................[NO] [OKAY][NO][NO] ..................... [OKAY][OKAY][OKAY] DeepSpeed general environment info: sparse_attn ............ [NO] sparse_attn.......sparse_attn ............sparse_attn[OKAY]............ torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 [NO]............ [NO] ....... [NO]transformer ....... [OKAY] ................... [OKAY] [OKAY][NO] torch cuda version ............... 11.1 transformer....... transformer transformer............ [OKAY] ........................ [NO] [NO][NO]....... ..............[OKAY] [OKAY][OKAY]stochastic_transformer nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] . stochastic_transformerstochastic_transformerstochastic_transformer [NO] ......... . [NO] [OKAY] [NO][NO] ....... ..............[OKAY] [OKAY][OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. nvcc version ..................... 11.2 async_io ............... [NO] ....... [NO] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] transformer_inference .. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. quantizer .............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninja ninja .................. .................................... ..................[OKAY] [OKAY]--------------------------------------------------[OKAY][OKAY] --------------------------------------------------op name-------------------------------------------------- -------------------------------------------------- ................ op nameop name op name ................................installed ................installed..installed installedcompatible.. ....compatible-------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- compatible -------------------------------------------------- cpu_adamcpu_adam cpu_adam ............... ............... ............... [YES] [YES] [YES] ...... ...... cpu_adam...... [OKAY] ...............[OKAY][OKAY] [YES] ...... fused_adam[OKAY] .............fused_adam fused_adam [NO] ................................. [NO][OKAY][NO] .............. fused_adam fused_lamb[OKAY] [OKAY] ............. ............. [NO][NO]fused_lamb fused_lamb ....... ............. ....... .............[NO][OKAY] [OKAY][NO]....... .......[OKAY] [OKAY] fused_lambsparse_attn ......................... [NO][NO] .......sparse_attn .......[OKAY] sparse_attn[OKAY] ............ ............ [NO][NO]transformer .......................... [OKAY][NO][OKAY] ....... transformersparse_attntransformer[OKAY] ............ ........................ [NO][NO]stochastic_transformer[NO] ....... .............. .[OKAY][OKAY] [NO][OKAY] ....... stochastic_transformer[OKAY]transformerstochastic_transformer .. ............[NO][NO] [NO] ....... ....... .......[OKAY] [OKAY] [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info:torch version .................... 1.8.1torch install path ...............torch cuda version torch install path ............... 11.1............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']nvcc version ..................... torch version11.2 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']....................deepspeed install path 1.8.1........... torch version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch cuda version.................... deepspeed info1.8.1............... ...................11.1 torch cuda version 0.4.2+bc17042, bc17042, big-science nvcc version ............... deepspeed wheel compiled w......................11.1 ......11.2 nvcc versiontorch 1.8, cuda 11.1deepspeed install path ................................ 11.2 deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...........deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-sciencedeepspeed info deepspeed wheel compiled w.................... ...... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................. .................. ....................................[OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name --------------------------------------------------................op nameop name installed................ ................op nameinstalled.. installed..compatible ..................compatible-------------------------------------------------- compatible installed -------------------------------------------------- --------------------------------------------------.. cpu_adam ...............compatible [YES]cpu_adam-------------------------------------------------- .....................cpu_adam ............... [YES][OKAY][YES] ............ [OKAY][OKAY] cpu_adam ............... [YES] ......fused_adam [OKAY].............fused_adam fused_adam[NO]............. ....................[NO] [OKAY][NO]....... .......[OKAY] [OKAY]fused_lambfused_adam fused_lamb .......................... fused_lamb [NO] .............[NO]............. [NO] .............. [NO] ....... .......[OKAY] [OKAY] [OKAY][OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attnsparse_attn sparse_attn ............ ........................ [NO] [NO] [NO] .......sparse_attn ....... ................... [OKAY] [OKAY][OKAY][NO] transformer transformertransformer............ ....... ........................[NO][OKAY] [NO][NO]....... ..............[OKAY] transformer[OKAY][OKAY] stochastic_transformer............ stochastic_transformer stochastic_transformer [NO]. ..[NO] ....... [NO][NO] ....... [OKAY].............. [OKAY][OKAY][OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ...... torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version ............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inferenceutils .................... [NO][YES] ............. [OKAY][OKAY] quantizer utils.............. ..................[NO] [YES]....... ......[OKAY] [OKAY] --------------------------------------------------quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY].......  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] [OKAY]quantizer async_io ............... [NO] ....... [NO] .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed general environment info: -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] async_ioasync_io .............................. [NO][NO] .............. [NO][NO] torch version .................... 1.8.1 transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] torch cuda version ............... 11.1 utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] nvcc version ..................... 11.2 ....... [OKAY] -------------------------------------------------- -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system utils .................. [YES] ...... [OKAY] meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 [YES] ......quantizer [OKAY].............. [NO] ....... [OKAY]quantizer .............. [NO] --------------------------------------------------....... torch cuda version ............... 11.1 [OKAY] -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 > setting tensorboard ... DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch versiontorch version ........................................ 1.8.11.8.1 deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info:DeepSpeed general environment info: torch version .................... 1.8.1 torch cuda version ............... 11.1 torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch versiontorch version ........................................ 1.8.11.8.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 ninjaninjaninjaninja .................................... .................. ..................[OKAY][OKAY] [OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name op name ................op name ................ installed................installed ................ ....installed installed compatible compatible .... -------------------------------------------------- -------------------------------------------------- compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... cpu_adam cpu_adam ...............[YES] ............... ...............[YES] ...... [YES] [YES] ...... [OKAY] ...... [OKAY]...... [OKAY][OKAY] fused_adam .............fused_adamfused_adam [NO] ............. fused_adam.................... .............[NO][NO][OKAY] [NO] ....... [OKAY] ..............fused_lamb fused_lamb[OKAY].............[OKAY] .............[NO] fused_lamb.......fused_lamb[NO] [OKAY].................... ............. [NO][OKAY] [NO] ....... .......[OKAY] [OKAY] sparse_attn ............ [NO] ....... sparse_attn[OKAY] ............sparse_attntransformer sparse_attn [NO] ........................ ................... [NO] [NO] [NO] .......[OKAY] ....... ....... [OKAY] [OKAY] transformer[OKAY] ............ transformerstochastic_transformer[NO] transformer ............ .................... [NO] [OKAY][NO][NO] ....... ..............[OKAY] stochastic_transformer[OKAY] [OKAY] .stochastic_transformer [NO]stochastic_transformer . ....... [NO].[OKAY] .......[NO] [OKAY]....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 torch cuda version ............... 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: deepspeed info ................... 0.4.2+bc17042, bc17042, big-science torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info:torch version .................... 1.8.1 torch cuda versiontorch install path .............................. 11.1 nvcc version ..................... 11.2['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed install pathtorch version ............................... 1.8.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infotorch cuda version .................................. 0.4.2+bc17042, bc17042, big-science11.1 deepspeed wheel compiled w.nvcc version ........................... torch 1.8, cuda 11.111.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install pathDeepSpeed general environment info: ...............DeepSpeed general environment info: torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']...............torch install path ............... torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version torch version...............torch version 11.1........................................ 1.8.1nvcc version1.8.1 .....................torch cuda version torch cuda version 11.2 ............... ............... deepspeed install path 11.1 11.1 ........... nvcc versionnvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'].......................................... /bin/sh: line 0: type: git: not found deepspeed info11.211.2 ...................deepspeed install path deepspeed install path 0.4.2+bc17042, bc17042, big-science ...................... deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...... deepspeed info deepspeed info torch 1.8, cuda 11.1...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1 torch version ....................torch cuda version 1.8.1............... 11.1 torch cuda version nvcc version............... .....................11.1 11.2nvcc version deepspeed install path..................... ...........11.2 deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']................... deepspeed info 0.4.2+bc17042, bc17042, big-science................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science DeepSpeed general environment info: deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch version .................... 1.8.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ............... 11.1 torch version .................... 1.8.1 nvcc version ..................... 11.2 torch cuda version ............... 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science nvcc version ..................... 11.2 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install pathtorch version ................................... 1.8.1 torch cuda version ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']11.1 nvcc version ..................... torch version11.2 ....................deepspeed install path 1.8.1........... torch cuda version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...............deepspeed info 11.1................... 0.4.2+bc17042, bc17042, big-sciencenvcc version .....................deepspeed wheel compiled w. 11.2...... deepspeed install pathtorch 1.8, cuda 11.1 ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1 torch cuda version ...............torch cuda version 11.1............... nvcc version11.1 .....................nvcc version 11.2..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install pathtorch version ................................... 1.8.1 torch cuda version ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 11.1 nvcc versiontorch version ......................................... 11.21.8.1 deepspeed install path torch cuda version........... ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed infonvcc version ........................................ 0.4.2+bc17042, bc17042, big-science11.2 deepspeed wheel compiled w.deepspeed install path ................. torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found async_ioasync_io async_io.............................. [NO]............... [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [NO] [NO] ....... ....... ....... [NO] [NO][NO] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inferencetransformer_inference .... transformer_inference[NO][NO] ................async_io [OKAY][OKAY][NO]............... [NO]....... .......[OKAY] utils [NO] utils .................................... [YES][YES] utils............ ..................[OKAY][OKAY] [YES] ...... transformer_inference[OKAY] quantizer .. quantizer .............. [NO] ..............[NO]quantizer....... [NO].....................[OKAY] ....... [OKAY][NO] [OKAY] ....... [OKAY] utils---------------------------------------------------------------------------------------------------- ..................-------------------------------------------------- [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 torch cuda version ............... 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.async_io ............... [NO] ....... [NO] async_iotransformer_inference ................. [NO][NO] .............. [NO][OKAY] utilstransformer_inference .................... [YES][NO] ............. [OKAY][OKAY] quantizer .............. [NO] utils....... ..................[OKAY] [YES] ......-------------------------------------------------- [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path DeepSpeed general environment info:........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ...................torch install path 0.4.2+bc17042, bc17042, big-science ............... deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'].................... 1.8.1 torch version torch cuda version.................... ...............1.8.1 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................torch install path 1.8.1............... torch cuda version ............... 11.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']nvcc version ..................... 11.2torch version deepspeed install path.................... ...........1.8.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch cuda version deepspeed info............... ...................11.1 nvcc version0.4.2+bc17042, bc17042, big-science .....................deepspeed wheel compiled w. 11.2...... deepspeed install pathtorch 1.8, cuda 11.1 ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] utils....... ..................[OKAY] [YES] ...... [OKAY] utils .................. [YES]quantizer .................... [OKAY][NO] ....... [OKAY] quantizer --------------------------------------------------.............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version ............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ...................... [NO] [NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................................... .................................... [OKAY] [OKAY] [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name op nameop name ................ ................................ ................ installedinstalledinstalledinstalled ........ compatiblecompatiblecompatiblecompatible -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cpu_adamcpu_adam cpu_adamcpu_adam .............................. ............... ............... [YES][YES] [YES] [YES] ............ ............ [OKAY][OKAY][OKAY][OKAY] fused_adamfused_adamfused_adam fused_adam............. ............. ............. .............[NO][NO] [NO][NO].............. ..............[OKAY][OKAY] [OKAY][OKAY] fused_lambfused_lambfused_lamb ..........................fused_lamb............. [NO] [NO] [NO]............. ....... ....... .......[NO] [OKAY] [OKAY][OKAY] ....... [OKAY] sparse_attnsparse_attnsparse_attn .................................... [NO]sparse_attn[NO][NO] ....... ............ .............. [OKAY] [NO] [OKAY][OKAY] ....... transformer[OKAY] transformer transformer............ ........................[NO] [NO][NO].......transformer ..............[OKAY]............ [OKAY][OKAY][NO] ....... stochastic_transformer[OKAY] stochastic_transformerstochastic_transformer . [NO].. stochastic_transformer .......[NO] [NO] [OKAY]........ .......[NO][OKAY] [OKAY]....... [OKAY] DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install pathDeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... torch version ....................torch install path 1.8.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ............... torch cuda versiontorch version ................................... 11.11.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version torch cuda version..................... torch version ............... 11.2 .................... 11.1 deepspeed install path1.8.1nvcc version ................................ torch cuda version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.2............... deepspeed info11.1deepspeed install path ..............................nvcc version 0.4.2+bc17042, bc17042, big-science..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed wheel compiled w.11.2 deepspeed info......deepspeed install path ...................torch 1.8, cuda 11.1........... 0.4.2+bc17042, bc17042, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. deepspeed info...... ...................torch 1.8, cuda 11.1 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................................... .................. ..................[OKAY][OKAY][OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- op name op name-------------------------------------------------- ................ op name op name................installed .................................. installed installed installed ..compatible.. ..compatible--------------------------------------------------compatible compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES]cpu_adam cpu_adam...... cpu_adam.............................. [OKAY][YES][YES]............... ............ [YES] [OKAY] [OKAY] ...... fused_adam[OKAY] ............. [NO] ....... [OKAY]fused_adam fused_adam .......................... fused_lamb[NO][NO] fused_adam .................... ....... [OKAY]............. [NO] [OKAY] fused_lamb [NO] .................... [OKAY]fused_lamb[NO] ........................... [NO][OKAY] [OKAY] ....... [OKAY] fused_lambsparse_attn ......................... [NO]sparse_attn [NO] ....... ............ .......[OKAY]sparse_attn[NO] [OKAY].......transformer............ [OKAY] ............[NO] [NO]....... transformer ....... [OKAY] ............ [OKAY] [NO]transformer ................... sparse_attn[OKAY][NO]stochastic_transformer ...................stochastic_transformer . [OKAY] [NO].[NO] .......stochastic_transformer [OKAY][NO] ............... [OKAY][OKAY][NO] ....... [OKAY]transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path DeepSpeed general environment info:...............DeepSpeed general environment info: DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch install pathtorch install path torch install path ..............................torch version ............... .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch cuda version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ............... 11.1torch versiontorch versiontorch version ....................nvcc version........................................ 1.8.1.....................1.8.11.8.1 11.2torch cuda version torch cuda versiontorch cuda version deepspeed install path............... ............... ............... ...........11.1 11.1 11.1 nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']nvcc version nvcc version ..................... ..................... deepspeed info..................... 11.2 11.211.2 ................... deepspeed install path deepspeed install pathdeepspeed install path 0.4.2+bc17042, bc17042, big-science................................. deepspeed wheel compiled w. ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...... deepspeed infodeepspeed infodeepspeed infotorch 1.8, cuda 11.1 ......................................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. deepspeed wheel compiled w. ...... ...... ...... torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op nameop name op nameop name ................ ................................ ................ installed installed installedinstalled .. ....compatible.. compatiblecompatible--------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES]cpu_adamcpu_adam cpu_adam ...... .............................. ............... [YES][OKAY] [YES] ......[YES] [OKAY]............ [OKAY][OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY] .............fused_adamfused_adam [NO]fused_lamb.......................... ....................[NO][NO] [OKAY].............. [NO] [OKAY][OKAY]....... fused_lamb [OKAY]............. fused_lambfused_lamb[NO] ................................. [NO][OKAY][NO] ....... sparse_attn ....... [OKAY]............ [OKAY][NO] ....... [OKAY] sparse_attn ............transformer [NO]............ sparse_attn[NO]....... sparse_attn............[OKAY]....... ............[OKAY][NO] transformer [NO] .......stochastic_transformer............ [NO] [OKAY] ....... ........ [OKAY]transformer[OKAY][NO] ................... transformer [NO] [OKAY] ............stochastic_transformer ....... [NO][OKAY] ........ [NO][OKAY] .......stochastic_transformer [OKAY]stochastic_transformer . [NO]. .......[NO] [OKAY]....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY] [OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op nameop name op name................op name................ installed................................installed installed....installed compatible..compatible.. --------------------------------------------------compatiblecompatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ............... cpu_adam[YES] cpu_adam..................... cpu_adam...............[YES][OKAY] .....................[YES] [OKAY][YES]...... ......[OKAY] [OKAY]fused_adam ............. [NO] ....... [OKAY]fused_adam ............. [NO]fused_adamfused_lamb ....... fused_adam .......................... [OKAY] ............. [NO] [NO] [NO].............. fused_lamb ....... [OKAY][OKAY] ............. [OKAY] [NO] fused_lamb....... .............fused_lamb [OKAY] [NO] ............. .......sparse_attn [NO] [OKAY] ................... [NO][OKAY] .......sparse_attn [OKAY]............ [NO] transformer....... ............[OKAY] [NO] .......transformer sparse_attnsparse_attn............[OKAY] ............[NO] ............[NO]stochastic_transformer ....... [NO] ....... [OKAY] ........ [OKAY] [OKAY][NO] stochastic_transformertransformer....... transformer............[OKAY]. ............[NO] [NO] [NO] ....... ..............[OKAY] [OKAY] [OKAY] stochastic_transformer . stochastic_transformer[NO] ....... .[OKAY] [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathDeepSpeed general environment info: torch install path............... ............... torch install path['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... torch version torch version.................... ....................1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1 torch cuda version ...............torch cuda versiontorch version 11.1................................... nvcc version11.11.8.1 .....................nvcc version 11.2torch cuda version..................... deepspeed install path...............11.2 ...........deepspeed install path 11.1 ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'].....................deepspeed info 11.2...................deepspeed info deepspeed install path0.4.2+bc17042, bc17042, big-science................... ...........0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. ......['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. torch 1.8, cuda 11.1......deepspeed info torch 1.8, cuda 11.1................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] ...... [OKAY]quantizer .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: DeepSpeed general environment info: torch install pathtorch install path ..............................torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version torch version ........................................torch version 1.8.11.8.1.................... 1.8.1torch cuda versiontorch cuda version .............................. torch cuda version 11.1 11.1 ............... nvcc versionnvcc version11.1 .......................................... nvcc version11.211.2 .....................deepspeed install pathdeepspeed install path 11.2...................... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... deepspeed infodeepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...................................... deepspeed info0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w....................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science............ deepspeed wheel compiled w.torch 1.8, cuda 11.1torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................................... ..................[OKAY] .................. [OKAY] [OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name op name................................op name ................installedinstalled................ installed installed.... compatible....compatible --------------------------------------------------compatible-------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam .............................. [YES][YES]cpu_adam ...... ......cpu_adam............... [OKAY]...............[OKAY][YES] [YES]...... ......[OKAY] [OKAY] fused_adamfused_adam .......................... [NO][NO] .......fused_adam....... [OKAY].............[OKAY] fused_adam [NO]fused_lamb.............fused_lamb .......[NO].......................... [OKAY].......[NO][NO] [OKAY]..............fused_lamb [OKAY][OKAY]............. fused_lamb [NO] .................... [NO][OKAY] ....... [OKAY]sparse_attn sparse_attn............ ............[NO] .......[NO] [OKAY]....... sparse_attn[OKAY] transformer ............ ............transformersparse_attn[NO] [NO]............................... .......[NO][OKAY] [NO] .......[OKAY]....... transformer[OKAY][OKAY] ............stochastic_transformer transformerstochastic_transformer [NO] ............. . .......[NO][NO] [OKAY][NO].............. [OKAY].......[OKAY] [OKAY] stochastic_transformer stochastic_transformer. [NO]. .......[NO] [OKAY] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name op name op name................................op name installedinstalled................................ ..installed.. compatibleinstalledcompatible.. --------------------------------------------------..--------------------------------------------------compatible compatible-------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam .............................. cpu_adam[YES][YES] cpu_adam........................... [YES][OKAY]............... [OKAY] ......[YES] [OKAY]...... [OKAY] fused_adam ............. [NO] .......fused_adam [OKAY].............fused_adam fused_adam [NO] ............. fused_lamb.................... .............[NO][NO][OKAY] .......[NO]....... fused_lamb[OKAY].......[OKAY] .............[OKAY] fused_lamb[NO] fused_lamb ................................. [NO][NO][OKAY] ..............sparse_attn [OKAY][OKAY]............ [NO] ....... [OKAY] sparse_attntransformer ........................ [NO][NO] sparse_attn .......sparse_attn ....... ............ [OKAY] ............[NO][OKAY] [NO]transformer....... ...................[OKAY]stochastic_transformer [NO][OKAY] . .......transformer[NO] [OKAY]transformer....... ............ ............[OKAY]stochastic_transformer[NO] [NO]....... . ....... [OKAY] [NO] [OKAY] .......stochastic_transformer [OKAY] stochastic_transformer . .[NO] [NO]....... .......[OKAY] [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install pathtorch install path ............................................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch versiontorch version.................... ........................................1.8.1 1.8.11.8.1 torch cuda version torch cuda version...............torch cuda version 11.1.............................. nvcc version11.111.1 .....................nvcc versionnvcc version 11.2.......................................... deepspeed install path11.211.2 ...........deepspeed install pathdeepspeed install path ......................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info...................deepspeed info ...................0.4.2+bc17042, bc17042, big-science................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.......deepspeed wheel compiled w. ......torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................................... [OKAY][OKAY][OKAY] [OKAY]------------------------------------------------------------------------------------------------------------------------------------------------------ op nameop nameop name-------------------------------------------------- ................................................ op nameinstalledinstalledinstalled .................... .. compatible compatibleinstalledcompatible-------------------------------------------------- ..---------------------------------------------------------------------------------------------------- compatible -------------------------------------------------- cpu_adam ............... [YES] cpu_adam......cpu_adam ...............[OKAY]cpu_adam ............... [YES]............... [YES] ...... [YES] ...... [OKAY] ......fused_adam [OKAY] .............[OKAY] [NO] ....... [OKAY] fused_adamfused_lamb fused_adam ............. ............. ............. fused_adam[NO] [NO] [NO] .................... ....... ....... [OKAY] [NO] [OKAY][OKAY] ....... fused_lamb[OKAY]fused_lamb .......................... [NO][NO] fused_lamb....... .......sparse_attn.............[OKAY] [OKAY] ............[NO] [NO]....... ....... [OKAY][OKAY] transformersparse_attnsparse_attn .................................... [NO][NO][NO] ..................... [OKAY][OKAY][OKAY]sparse_attn ............ transformerstochastic_transformer [NO]transformer ............ .................... [NO] [NO][OKAY]....... [NO] ....... ....... [OKAY] [OKAY] transformer[OKAY] ............ stochastic_transformer[NO] stochastic_transformer ......... [OKAY][NO][NO] .............. [OKAY][OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 8 > setting random seeds to 42 ... [2021-09-27 03:54:33,898] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' >>> done with dataset index builder. Compilation time: 0.303 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 20.521 seconds time to initialize megatron (seconds): -17.161 [after megatron is initialized] datetime: 2021-09-27 03:54:54 building GPT model ... [2021-09-27 03:54:54,901] [INFO] [utils.py:680:see_memory_usage] Before Building Model /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved warnings.warn( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved warnings.warn( [2021-09-27 03:54:54,903] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-09-27 03:54:54,903] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 36.85 GB, percent = 19.7% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=0, data=4, model=0): 16, ProcessCoord(pipe=0, data=4, model=1): 17, ProcessCoord(pipe=0, data=4, model=2): 18, ProcessCoord(pipe=0, data=4, model=3): 19, ProcessCoord(pipe=0, data=5, model=0): 20, ProcessCoord(pipe=0, data=5, model=1): 21, ProcessCoord(pipe=0, data=5, model=2): 22, ProcessCoord(pipe=0, data=5, model=3): 23, ProcessCoord(pipe=0, data=6, model=0): 24, ProcessCoord(pipe=0, data=6, model=1): 25, ProcessCoord(pipe=0, data=6, model=2): 26, ProcessCoord(pipe=0, data=6, model=3): 27, ProcessCoord(pipe=0, data=7, model=0): 28, ProcessCoord(pipe=0, data=7, model=1): 29, ProcessCoord(pipe=0, data=7, model=2): 30, ProcessCoord(pipe=0, data=7, model=3): 31, ProcessCoord(pipe=0, data=8, model=0): 32, ProcessCoord(pipe=0, data=8, model=1): 33, ProcessCoord(pipe=0, data=8, model=2): 34, ProcessCoord(pipe=0, data=8, model=3): 35, ProcessCoord(pipe=0, data=9, model=0): 36, ProcessCoord(pipe=0, data=9, model=1): 37, ProcessCoord(pipe=0, data=9, model=2): 38, ProcessCoord(pipe=0, data=9, model=3): 39, ProcessCoord(pipe=0, data=10, model=0): 40, ProcessCoord(pipe=0, data=10, model=1): 41, ProcessCoord(pipe=0, data=10, model=2): 42, ProcessCoord(pipe=0, data=10, model=3): 43, ProcessCoord(pipe=0, data=11, model=0): 44, ProcessCoord(pipe=0, data=11, model=1): 45, ProcessCoord(pipe=0, data=11, model=2): 46, ProcessCoord(pipe=0, data=11, model=3): 47, ProcessCoord(pipe=0, data=12, model=0): 48, ProcessCoord(pipe=0, data=12, model=1): 49, ProcessCoord(pipe=0, data=12, model=2): 50, ProcessCoord(pipe=0, data=12, model=3): 51, ProcessCoord(pipe=0, data=13, model=0): 52, ProcessCoord(pipe=0, data=13, model=1): 53, ProcessCoord(pipe=0, data=13, model=2): 54, ProcessCoord(pipe=0, data=13, model=3): 55, ProcessCoord(pipe=0, data=14, model=0): 56, ProcessCoord(pipe=0, data=14, model=1): 57, ProcessCoord(pipe=0, data=14, model=2): 58, ProcessCoord(pipe=0, data=14, model=3): 59, ProcessCoord(pipe=0, data=15, model=0): 60, ProcessCoord(pipe=0, data=15, model=1): 61, ProcessCoord(pipe=0, data=15, model=2): 62, ProcessCoord(pipe=0, data=15, model=3): 63, ProcessCoord(pipe=1, data=0, model=0): 64, ProcessCoord(pipe=1, data=0, model=1): 65, ProcessCoord(pipe=1, data=0, model=2): 66, ProcessCoord(pipe=1, data=0, model=3): 67, ProcessCoord(pipe=1, data=1, model=0): 68, ProcessCoord(pipe=1, data=1, model=1): 69, ProcessCoord(pipe=1, data=1, model=2): 70, ProcessCoord(pipe=1, data=1, model=3): 71, ProcessCoord(pipe=1, data=2, model=0): 72, ProcessCoord(pipe=1, data=2, model=1): 73, ProcessCoord(pipe=1, data=2, model=2): 74, ProcessCoord(pipe=1, data=2, model=3): 75, ProcessCoord(pipe=1, data=3, model=0): 76, ProcessCoord(pipe=1, data=3, model=1): 77, ProcessCoord(pipe=1, data=3, model=2): 78, ProcessCoord(pipe=1, data=3, model=3): 79, ProcessCoord(pipe=1, data=4, model=0): 80, ProcessCoord(pipe=1, data=4, model=1): 81, ProcessCoord(pipe=1, data=4, model=2): 82, ProcessCoord(pipe=1, data=4, model=3): 83, ProcessCoord(pipe=1, data=5, model=0): 84, ProcessCoord(pipe=1, data=5, model=1): 85, ProcessCoord(pipe=1, data=5, model=2): 86, ProcessCoord(pipe=1, data=5, model=3): 87, ProcessCoord(pipe=1, data=6, model=0): 88, ProcessCoord(pipe=1, data=6, model=1): 89, ProcessCoord(pipe=1, data=6, model=2): 90, ProcessCoord(pipe=1, data=6, model=3): 91, ProcessCoord(pipe=1, data=7, model=0): 92, ProcessCoord(pipe=1, data=7, model=1): 93, ProcessCoord(pipe=1, data=7, model=2): 94, ProcessCoord(pipe=1, data=7, model=3): 95, ProcessCoord(pipe=1, data=8, model=0): 96, ProcessCoord(pipe=1, data=8, model=1): 97, ProcessCoord(pipe=1, data=8, model=2): 98, ProcessCoord(pipe=1, data=8, model=3): 99, ProcessCoord(pipe=1, data=9, model=0): 100, ProcessCoord(pipe=1, data=9, model=1): 101, ProcessCoord(pipe=1, data=9, model=2): 102, ProcessCoord(pipe=1, data=9, model=3): 103, ProcessCoord(pipe=1, data=10, model=0): 104, ProcessCoord(pipe=1, data=10, model=1): 105, ProcessCoord(pipe=1, data=10, model=2): 106, ProcessCoord(pipe=1, data=10, model=3): 107, ProcessCoord(pipe=1, data=11, model=0): 108, ProcessCoord(pipe=1, data=11, model=1): 109, ProcessCoord(pipe=1, data=11, model=2): 110, ProcessCoord(pipe=1, data=11, model=3): 111, ProcessCoord(pipe=1, data=12, model=0): 112, ProcessCoord(pipe=1, data=12, model=1): 113, ProcessCoord(pipe=1, data=12, model=2): 114, ProcessCoord(pipe=1, data=12, model=3): 115, ProcessCoord(pipe=1, data=13, model=0): 116, ProcessCoord(pipe=1, data=13, model=1): 117, ProcessCoord(pipe=1, data=13, model=2): 118, ProcessCoord(pipe=1, data=13, model=3): 119, ProcessCoord(pipe=1, data=14, model=0): 120, ProcessCoord(pipe=1, data=14, model=1): 121, ProcessCoord(pipe=1, data=14, model=2): 122, ProcessCoord(pipe=1, data=14, model=3): 123, ProcessCoord(pipe=1, data=15, model=0): 124, ProcessCoord(pipe=1, data=15, model=1): 125, ProcessCoord(pipe=1, data=15, model=2): 126, ProcessCoord(pipe=1, data=15, model=3): 127, ProcessCoord(pipe=2, data=0, model=0): 128, ProcessCoord(pipe=2, data=0, model=1): 129, ProcessCoord(pipe=2, data=0, model=2): 130, ProcessCoord(pipe=2, data=0, model=3): 131, ProcessCoord(pipe=2, data=1, model=0): 132, ProcessCoord(pipe=2, data=1, model=1): 133, ProcessCoord(pipe=2, data=1, model=2): 134, ProcessCoord(pipe=2, data=1, model=3): 135, ProcessCoord(pipe=2, data=2, model=0): 136, ProcessCoord(pipe=2, data=2, model=1): 137, ProcessCoord(pipe=2, data=2, model=2): 138, ProcessCoord(pipe=2, data=2, model=3): 139, ProcessCoord(pipe=2, data=3, model=0): 140, ProcessCoord(pipe=2, data=3, model=1): 141, ProcessCoord(pipe=2, data=3, model=2): 142, ProcessCoord(pipe=2, data=3, model=3): 143, ProcessCoord(pipe=2, data=4, model=0): 144, ProcessCoord(pipe=2, data=4, model=1): 145, ProcessCoord(pipe=2, data=4, model=2): 146, ProcessCoord(pipe=2, data=4, model=3): 147, ProcessCoord(pipe=2, data=5, model=0): 148, ProcessCoord(pipe=2, data=5, model=1): 149, ProcessCoord(pipe=2, data=5, model=2): 150, ProcessCoord(pipe=2, data=5, model=3): 151, ProcessCoord(pipe=2, data=6, model=0): 152, ProcessCoord(pipe=2, data=6, model=1): 153, ProcessCoord(pipe=2, data=6, model=2): 154, ProcessCoord(pipe=2, data=6, model=3): 155, ProcessCoord(pipe=2, data=7, model=0): 156, ProcessCoord(pipe=2, data=7, model=1): 157, ProcessCoord(pipe=2, data=7, model=2): 158, ProcessCoord(pipe=2, data=7, model=3): 159, ProcessCoord(pipe=2, data=8, model=0): 160, ProcessCoord(pipe=2, data=8, model=1): 161, ProcessCoord(pipe=2, data=8, model=2): 162, ProcessCoord(pipe=2, data=8, model=3): 163, ProcessCoord(pipe=2, data=9, model=0): 164, ProcessCoord(pipe=2, data=9, model=1): 165, ProcessCoord(pipe=2, data=9, model=2): 166, ProcessCoord(pipe=2, data=9, model=3): 167, ProcessCoord(pipe=2, data=10, model=0): 168, ProcessCoord(pipe=2, data=10, model=1): 169, ProcessCoord(pipe=2, data=10, model=2): 170, ProcessCoord(pipe=2, data=10, model=3): 171, ProcessCoord(pipe=2, data=11, model=0): 172, ProcessCoord(pipe=2, data=11, model=1): 173, ProcessCoord(pipe=2, data=11, model=2): 174, ProcessCoord(pipe=2, data=11, model=3): 175, ProcessCoord(pipe=2, data=12, model=0): 176, ProcessCoord(pipe=2, data=12, model=1): 177, ProcessCoord(pipe=2, data=12, model=2): 178, ProcessCoord(pipe=2, data=12, model=3): 179, ProcessCoord(pipe=2, data=13, model=0): 180, ProcessCoord(pipe=2, data=13, model=1): 181, ProcessCoord(pipe=2, data=13, model=2): 182, ProcessCoord(pipe=2, data=13, model=3): 183, ProcessCoord(pipe=2, data=14, model=0): 184, ProcessCoord(pipe=2, data=14, model=1): 185, ProcessCoord(pipe=2, data=14, model=2): 186, ProcessCoord(pipe=2, data=14, model=3): 187, ProcessCoord(pipe=2, data=15, model=0): 188, ProcessCoord(pipe=2, data=15, model=1): 189, ProcessCoord(pipe=2, data=15, model=2): 190, ProcessCoord(pipe=2, data=15, model=3): 191, ProcessCoord(pipe=3, data=0, model=0): 192, ProcessCoord(pipe=3, data=0, model=1): 193, ProcessCoord(pipe=3, data=0, model=2): 194, ProcessCoord(pipe=3, data=0, model=3): 195, ProcessCoord(pipe=3, data=1, model=0): 196, ProcessCoord(pipe=3, data=1, model=1): 197, ProcessCoord(pipe=3, data=1, model=2): 198, ProcessCoord(pipe=3, data=1, model=3): 199, ProcessCoord(pipe=3, data=2, model=0): 200, ProcessCoord(pipe=3, data=2, model=1): 201, ProcessCoord(pipe=3, data=2, model=2): 202, ProcessCoord(pipe=3, data=2, model=3): 203, ProcessCoord(pipe=3, data=3, model=0): 204, ProcessCoord(pipe=3, data=3, model=1): 205, ProcessCoord(pipe=3, data=3, model=2): 206, ProcessCoord(pipe=3, data=3, model=3): 207, ProcessCoord(pipe=3, data=4, model=0): 208, ProcessCoord(pipe=3, data=4, model=1): 209, ProcessCoord(pipe=3, data=4, model=2): 210, ProcessCoord(pipe=3, data=4, model=3): 211, ProcessCoord(pipe=3, data=5, model=0): 212, ProcessCoord(pipe=3, data=5, model=1): 213, ProcessCoord(pipe=3, data=5, model=2): 214, ProcessCoord(pipe=3, data=5, model=3): 215, ProcessCoord(pipe=3, data=6, model=0): 216, ProcessCoord(pipe=3, data=6, model=1): 217, ProcessCoord(pipe=3, data=6, model=2): 218, ProcessCoord(pipe=3, data=6, model=3): 219, ProcessCoord(pipe=3, data=7, model=0): 220, ProcessCoord(pipe=3, data=7, model=1): 221, ProcessCoord(pipe=3, data=7, model=2): 222, ProcessCoord(pipe=3, data=7, model=3): 223, ProcessCoord(pipe=3, data=8, model=0): 224, ProcessCoord(pipe=3, data=8, model=1): 225, ProcessCoord(pipe=3, data=8, model=2): 226, ProcessCoord(pipe=3, data=8, model=3): 227, ProcessCoord(pipe=3, data=9, model=0): 228, ProcessCoord(pipe=3, data=9, model=1): 229, ProcessCoord(pipe=3, data=9, model=2): 230, ProcessCoord(pipe=3, data=9, model=3): 231, ProcessCoord(pipe=3, data=10, model=0): 232, ProcessCoord(pipe=3, data=10, model=1): 233, ProcessCoord(pipe=3, data=10, model=2): 234, ProcessCoord(pipe=3, data=10, model=3): 235, ProcessCoord(pipe=3, data=11, model=0): 236, ProcessCoord(pipe=3, data=11, model=1): 237, ProcessCoord(pipe=3, data=11, model=2): 238, ProcessCoord(pipe=3, data=11, model=3): 239, ProcessCoord(pipe=3, data=12, model=0): 240, ProcessCoord(pipe=3, data=12, model=1): 241, ProcessCoord(pipe=3, data=12, model=2): 242, ProcessCoord(pipe=3, data=12, model=3): 243, ProcessCoord(pipe=3, data=13, model=0): 244, ProcessCoord(pipe=3, data=13, model=1): 245, ProcessCoord(pipe=3, data=13, model=2): 246, ProcessCoord(pipe=3, data=13, model=3): 247, ProcessCoord(pipe=3, data=14, model=0): 248, ProcessCoord(pipe=3, data=14, model=1): 249, ProcessCoord(pipe=3, data=14, model=2): 250, ProcessCoord(pipe=3, data=14, model=3): 251, ProcessCoord(pipe=3, data=15, model=0): 252, ProcessCoord(pipe=3, data=15, model=1): 253, ProcessCoord(pipe=3, data=15, model=2): 254, ProcessCoord(pipe=3, data=15, model=3): 255, ProcessCoord(pipe=4, data=0, model=0): 256, ProcessCoord(pipe=4, data=0, model=1): 257, ProcessCoord(pipe=4, data=0, model=2): 258, ProcessCoord(pipe=4, data=0, model=3): 259, ProcessCoord(pipe=4, data=1, model=0): 260, ProcessCoord(pipe=4, data=1, model=1): 261, ProcessCoord(pipe=4, data=1, model=2): 262, ProcessCoord(pipe=4, data=1, model=3): 263, ProcessCoord(pipe=4, data=2, model=0): 264, ProcessCoord(pipe=4, data=2, model=1): 265, ProcessCoord(pipe=4, data=2, model=2): 266, ProcessCoord(pipe=4, data=2, model=3): 267, ProcessCoord(pipe=4, data=3, model=0): 268, ProcessCoord(pipe=4, data=3, model=1): 269, ProcessCoord(pipe=4, data=3, model=2): 270, ProcessCoord(pipe=4, data=3, model=3): 271, ProcessCoord(pipe=4, data=4, model=0): 272, ProcessCoord(pipe=4, data=4, model=1): 273, ProcessCoord(pipe=4, data=4, model=2): 274, ProcessCoord(pipe=4, data=4, model=3): 275, ProcessCoord(pipe=4, data=5, model=0): 276, ProcessCoord(pipe=4, data=5, model=1): 277, ProcessCoord(pipe=4, data=5, model=2): 278, ProcessCoord(pipe=4, data=5, model=3): 279, ProcessCoord(pipe=4, data=6, model=0): 280, ProcessCoord(pipe=4, data=6, model=1): 281, ProcessCoord(pipe=4, data=6, model=2): 282, ProcessCoord(pipe=4, data=6, model=3): 283, ProcessCoord(pipe=4, data=7, model=0): 284, ProcessCoord(pipe=4, data=7, model=1): 285, ProcessCoord(pipe=4, data=7, model=2): 286, ProcessCoord(pipe=4, data=7, model=3): 287, ProcessCoord(pipe=4, data=8, model=0): 288, ProcessCoord(pipe=4, data=8, model=1): 289, ProcessCoord(pipe=4, data=8, model=2): 290, ProcessCoord(pipe=4, data=8, model=3): 291, ProcessCoord(pipe=4, data=9, model=0): 292, ProcessCoord(pipe=4, data=9, model=1): 293, ProcessCoord(pipe=4, data=9, model=2): 294, ProcessCoord(pipe=4, data=9, model=3): 295, ProcessCoord(pipe=4, data=10, model=0): 296, ProcessCoord(pipe=4, data=10, model=1): 297, ProcessCoord(pipe=4, data=10, model=2): 298, ProcessCoord(pipe=4, data=10, model=3): 299, ProcessCoord(pipe=4, data=11, model=0): 300, ProcessCoord(pipe=4, data=11, model=1): 301, ProcessCoord(pipe=4, data=11, model=2): 302, ProcessCoord(pipe=4, data=11, model=3): 303, ProcessCoord(pipe=4, data=12, model=0): 304, ProcessCoord(pipe=4, data=12, model=1): 305, ProcessCoord(pipe=4, data=12, model=2): 306, ProcessCoord(pipe=4, data=12, model=3): 307, ProcessCoord(pipe=4, data=13, model=0): 308, ProcessCoord(pipe=4, data=13, model=1): 309, ProcessCoord(pipe=4, data=13, model=2): 310, ProcessCoord(pipe=4, data=13, model=3): 311, ProcessCoord(pipe=4, data=14, model=0): 312, ProcessCoord(pipe=4, data=14, model=1): 313, ProcessCoord(pipe=4, data=14, model=2): 314, ProcessCoord(pipe=4, data=14, model=3): 315, ProcessCoord(pipe=4, data=15, model=0): 316, ProcessCoord(pipe=4, data=15, model=1): 317, ProcessCoord(pipe=4, data=15, model=2): 318, ProcessCoord(pipe=4, data=15, model=3): 319, ProcessCoord(pipe=5, data=0, model=0): 320, ProcessCoord(pipe=5, data=0, model=1): 321, ProcessCoord(pipe=5, data=0, model=2): 322, ProcessCoord(pipe=5, data=0, model=3): 323, ProcessCoord(pipe=5, data=1, model=0): 324, ProcessCoord(pipe=5, data=1, model=1): 325, ProcessCoord(pipe=5, data=1, model=2): 326, ProcessCoord(pipe=5, data=1, model=3): 327, ProcessCoord(pipe=5, data=2, model=0): 328, ProcessCoord(pipe=5, data=2, model=1): 329, ProcessCoord(pipe=5, data=2, model=2): 330, ProcessCoord(pipe=5, data=2, model=3): 331, ProcessCoord(pipe=5, data=3, model=0): 332, ProcessCoord(pipe=5, data=3, model=1): 333, ProcessCoord(pipe=5, data=3, model=2): 334, ProcessCoord(pipe=5, data=3, model=3): 335, ProcessCoord(pipe=5, data=4, model=0): 336, ProcessCoord(pipe=5, data=4, model=1): 337, ProcessCoord(pipe=5, data=4, model=2): 338, ProcessCoord(pipe=5, data=4, model=3): 339, ProcessCoord(pipe=5, data=5, model=0): 340, ProcessCoord(pipe=5, data=5, model=1): 341, ProcessCoord(pipe=5, data=5, model=2): 342, ProcessCoord(pipe=5, data=5, model=3): 343, ProcessCoord(pipe=5, data=6, model=0): 344, ProcessCoord(pipe=5, data=6, model=1): 345, ProcessCoord(pipe=5, data=6, model=2): 346, ProcessCoord(pipe=5, data=6, model=3): 347, ProcessCoord(pipe=5, data=7, model=0): 348, ProcessCoord(pipe=5, data=7, model=1): 349, ProcessCoord(pipe=5, data=7, model=2): 350, ProcessCoord(pipe=5, data=7, model=3): 351, ProcessCoord(pipe=5, data=8, model=0): 352, ProcessCoord(pipe=5, data=8, model=1): 353, ProcessCoord(pipe=5, data=8, model=2): 354, ProcessCoord(pipe=5, data=8, model=3): 355, ProcessCoord(pipe=5, data=9, model=0): 356, ProcessCoord(pipe=5, data=9, model=1): 357, ProcessCoord(pipe=5, data=9, model=2): 358, ProcessCoord(pipe=5, data=9, model=3): 359, ProcessCoord(pipe=5, data=10, model=0): 360, ProcessCoord(pipe=5, data=10, model=1): 361, ProcessCoord(pipe=5, data=10, model=2): 362, ProcessCoord(pipe=5, data=10, model=3): 363, ProcessCoord(pipe=5, data=11, model=0): 364, ProcessCoord(pipe=5, data=11, model=1): 365, ProcessCoord(pipe=5, data=11, model=2): 366, ProcessCoord(pipe=5, data=11, model=3): 367, ProcessCoord(pipe=5, data=12, model=0): 368, ProcessCoord(pipe=5, data=12, model=1): 369, ProcessCoord(pipe=5, data=12, model=2): 370, ProcessCoord(pipe=5, data=12, model=3): 371, ProcessCoord(pipe=5, data=13, model=0): 372, ProcessCoord(pipe=5, data=13, model=1): 373, ProcessCoord(pipe=5, data=13, model=2): 374, ProcessCoord(pipe=5, data=13, model=3): 375, ProcessCoord(pipe=5, data=14, model=0): 376, ProcessCoord(pipe=5, data=14, model=1): 377, ProcessCoord(pipe=5, data=14, model=2): 378, ProcessCoord(pipe=5, data=14, model=3): 379, ProcessCoord(pipe=5, data=15, model=0): 380, ProcessCoord(pipe=5, data=15, model=1): 381, ProcessCoord(pipe=5, data=15, model=2): 382, ProcessCoord(pipe=5, data=15, model=3): 383, ProcessCoord(pipe=6, data=0, model=0): 384, ProcessCoord(pipe=6, data=0, model=1): 385, ProcessCoord(pipe=6, data=0, model=2): 386, ProcessCoord(pipe=6, data=0, model=3): 387, ProcessCoord(pipe=6, data=1, model=0): 388, ProcessCoord(pipe=6, data=1, model=1): 389, ProcessCoord(pipe=6, data=1, model=2): 390, ProcessCoord(pipe=6, data=1, model=3): 391, ProcessCoord(pipe=6, data=2, model=0): 392, ProcessCoord(pipe=6, data=2, model=1): 393, ProcessCoord(pipe=6, data=2, model=2): 394, ProcessCoord(pipe=6, data=2, model=3): 395, ProcessCoord(pipe=6, data=3, model=0): 396, ProcessCoord(pipe=6, data=3, model=1): 397, ProcessCoord(pipe=6, data=3, model=2): 398, ProcessCoord(pipe=6, data=3, model=3): 399, ProcessCoord(pipe=6, data=4, model=0): 400, ProcessCoord(pipe=6, data=4, model=1): 401, ProcessCoord(pipe=6, data=4, model=2): 402, ProcessCoord(pipe=6, data=4, model=3): 403, ProcessCoord(pipe=6, data=5, model=0): 404, ProcessCoord(pipe=6, data=5, model=1): 405, ProcessCoord(pipe=6, data=5, model=2): 406, ProcessCoord(pipe=6, data=5, model=3): 407, ProcessCoord(pipe=6, data=6, model=0): 408, ProcessCoord(pipe=6, data=6, model=1): 409, ProcessCoord(pipe=6, data=6, model=2): 410, ProcessCoord(pipe=6, data=6, model=3): 411, ProcessCoord(pipe=6, data=7, model=0): 412, ProcessCoord(pipe=6, data=7, model=1): 413, ProcessCoord(pipe=6, data=7, model=2): 414, ProcessCoord(pipe=6, data=7, model=3): 415, ProcessCoord(pipe=6, data=8, model=0): 416, ProcessCoord(pipe=6, data=8, model=1): 417, ProcessCoord(pipe=6, data=8, model=2): 418, ProcessCoord(pipe=6, data=8, model=3): 419, ProcessCoord(pipe=6, data=9, model=0): 420, ProcessCoord(pipe=6, data=9, model=1): 421, ProcessCoord(pipe=6, data=9, model=2): 422, ProcessCoord(pipe=6, data=9, model=3): 423, ProcessCoord(pipe=6, data=10, model=0): 424, ProcessCoord(pipe=6, data=10, model=1): 425, ProcessCoord(pipe=6, data=10, model=2): 426, ProcessCoord(pipe=6, data=10, model=3): 427, ProcessCoord(pipe=6, data=11, model=0): 428, ProcessCoord(pipe=6, data=11, model=1): 429, ProcessCoord(pipe=6, data=11, model=2): 430, ProcessCoord(pipe=6, data=11, model=3): 431, ProcessCoord(pipe=6, data=12, model=0): 432, ProcessCoord(pipe=6, data=12, model=1): 433, ProcessCoord(pipe=6, data=12, model=2): 434, ProcessCoord(pipe=6, data=12, model=3): 435, ProcessCoord(pipe=6, data=13, model=0): 436, ProcessCoord(pipe=6, data=13, model=1): 437, ProcessCoord(pipe=6, data=13, model=2): 438, ProcessCoord(pipe=6, data=13, model=3): 439, ProcessCoord(pipe=6, data=14, model=0): 440, ProcessCoord(pipe=6, data=14, model=1): 441, ProcessCoord(pipe=6, data=14, model=2): 442, ProcessCoord(pipe=6, data=14, model=3): 443, ProcessCoord(pipe=6, data=15, model=0): 444, ProcessCoord(pipe=6, data=15, model=1): 445, ProcessCoord(pipe=6, data=15, model=2): 446, ProcessCoord(pipe=6, data=15, model=3): 447, ProcessCoord(pipe=7, data=0, model=0): 448, ProcessCoord(pipe=7, data=0, model=1): 449, ProcessCoord(pipe=7, data=0, model=2): 450, ProcessCoord(pipe=7, data=0, model=3): 451, ProcessCoord(pipe=7, data=1, model=0): 452, ProcessCoord(pipe=7, data=1, model=1): 453, ProcessCoord(pipe=7, data=1, model=2): 454, ProcessCoord(pipe=7, data=1, model=3): 455, ProcessCoord(pipe=7, data=2, model=0): 456, ProcessCoord(pipe=7, data=2, model=1): 457, ProcessCoord(pipe=7, data=2, model=2): 458, ProcessCoord(pipe=7, data=2, model=3): 459, ProcessCoord(pipe=7, data=3, model=0): 460, ProcessCoord(pipe=7, data=3, model=1): 461, ProcessCoord(pipe=7, data=3, model=2): 462, ProcessCoord(pipe=7, data=3, model=3): 463, ProcessCoord(pipe=7, data=4, model=0): 464, ProcessCoord(pipe=7, data=4, model=1): 465, ProcessCoord(pipe=7, data=4, model=2): 466, ProcessCoord(pipe=7, data=4, model=3): 467, ProcessCoord(pipe=7, data=5, model=0): 468, ProcessCoord(pipe=7, data=5, model=1): 469, ProcessCoord(pipe=7, data=5, model=2): 470, ProcessCoord(pipe=7, data=5, model=3): 471, ProcessCoord(pipe=7, data=6, model=0): 472, ProcessCoord(pipe=7, data=6, model=1): 473, ProcessCoord(pipe=7, data=6, model=2): 474, ProcessCoord(pipe=7, data=6, model=3): 475, ProcessCoord(pipe=7, data=7, model=0): 476, ProcessCoord(pipe=7, data=7, model=1): 477, ProcessCoord(pipe=7, data=7, model=2): 478, ProcessCoord(pipe=7, data=7, model=3): 479, ProcessCoord(pipe=7, data=8, model=0): 480, ProcessCoord(pipe=7, data=8, model=1): 481, ProcessCoord(pipe=7, data=8, model=2): 482, ProcessCoord(pipe=7, data=8, model=3): 483, ProcessCoord(pipe=7, data=9, model=0): 484, ProcessCoord(pipe=7, data=9, model=1): 485, ProcessCoord(pipe=7, data=9, model=2): 486, ProcessCoord(pipe=7, data=9, model=3): 487, ProcessCoord(pipe=7, data=10, model=0): 488, ProcessCoord(pipe=7, data=10, model=1): 489, ProcessCoord(pipe=7, data=10, model=2): 490, ProcessCoord(pipe=7, data=10, model=3): 491, ProcessCoord(pipe=7, data=11, model=0): 492, ProcessCoord(pipe=7, data=11, model=1): 493, ProcessCoord(pipe=7, data=11, model=2): 494, ProcessCoord(pipe=7, data=11, model=3): 495, ProcessCoord(pipe=7, data=12, model=0): 496, ProcessCoord(pipe=7, data=12, model=1): 497, ProcessCoord(pipe=7, data=12, model=2): 498, ProcessCoord(pipe=7, data=12, model=3): 499, ProcessCoord(pipe=7, data=13, model=0): 500, ProcessCoord(pipe=7, data=13, model=1): 501, ProcessCoord(pipe=7, data=13, model=2): 502, ProcessCoord(pipe=7, data=13, model=3): 503, ProcessCoord(pipe=7, data=14, model=0): 504, ProcessCoord(pipe=7, data=14, model=1): 505, ProcessCoord(pipe=7, data=14, model=2): 506, ProcessCoord(pipe=7, data=14, model=3): 507, ProcessCoord(pipe=7, data=15, model=0): 508, ProcessCoord(pipe=7, data=15, model=1): 509, ProcessCoord(pipe=7, data=15, model=2): 510, ProcessCoord(pipe=7, data=15, model=3): 511} [2021-09-27 03:54:57,678] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=7 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe stage=1 layers=4 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe stage=2 layers=4 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=3 layers=4 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe stage=4 layers=4 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe stage=5 layers=4 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe stage=6 layers=4 27: ParallelTransformerLayerPipe 28: ParallelTransformerLayerPipe 29: ParallelTransformerLayerPipe 30: ParallelTransformerLayerPipe stage=7 layers=8 31: ParallelTransformerLayerPipe 32: ParallelTransformerLayerPipe 33: ParallelTransformerLayerPipe 34: ParallelTransformerLayerPipe 35: 36: MixedFusedLayerNorm 37: EmbeddingPipe 38: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 1986498560 [2021-09-27 03:54:59,504] [INFO] [utils.py:680:see_memory_usage] After Building Model [2021-09-27 03:54:59,505] [INFO] [utils.py:681:see_memory_usage] MA 3.77 GB Max_MA 3.79 GB CA 3.79 GB Max_CA 4 GB [2021-09-27 03:54:59,505] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 37.03 GB, percent = 19.8% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 1986465792 setting training iterations to 159576 > learning rate decay style: cosine DeepSpeed is enabled. [2021-09-27 03:54:59,644] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+bc17042, git-hash=bc17042, git-branch=big-science [2021-09-27 03:54:59,753] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-09-27 03:54:59,753] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-09-27 03:54:59,753] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer [2021-09-27 03:54:59,753] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-09-27 03:54:59,753] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-09-27 03:54:59,753] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-09-27 03:54:59,753] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-09-27 03:54:59,753] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-09-27 03:54:59,753] [INFO] [stage2.py:108:__init__] CPU Offload: False [2021-09-27 03:54:59,753] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False [2021-09-27 03:55:04,471] [INFO] [stage2.py:419:__init__] optimizer state initialized [2021-09-27 03:55:04,471] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-09-27 03:55:04,471] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-09-27 03:55:04,471] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-09-27 03:55:04,471] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-09-27 03:55:04,471] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] amp_enabled .................. False [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] amp_params ................... False [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] disable_allgather ............ False [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] dump_state ................... False [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] global_rank .................. 0 [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] gradient_accumulation_steps .. 128 [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] gradient_clipping ............ 1.0 [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4096 [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] optimizer_name ............... None [2021-09-27 03:55:04,472] [INFO] [config.py:904:print] optimizer_params ............. None [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] pld_enabled .................. False [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] pld_params ................... False [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] scheduler_name ............... None [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] scheduler_params ............. None [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] sparse_attention ............. None [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] steps_per_print .............. 2000 [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] train_batch_size ............. 2048 [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 1 [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] world_size ................... 16 [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] zero_allow_untested_optimizer False [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] zero_config .................. { "stage": 1, "contiguous_gradients": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] zero_enabled ................. True [2021-09-27 03:55:04,473] [INFO] [config.py:904:print] zero_optimization_stage ...... 1 [2021-09-27 03:55:04,473] [INFO] [config.py:906:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 2.048000e+03, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-09-27 03:55:04,474] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=128 micro_batch_size=1 [2021-09-27 03:55:04,910] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,910] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,910] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=259 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=256 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=258 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=257 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=130 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=129 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=131 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=128 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=384 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=385 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=386 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=387 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=194 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=195 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=193 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=192 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=449 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=448 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=451 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=321 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=320 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=322 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=323 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=66 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=67 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=64 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=450 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 03:55:04,911] [INFO] [engine.py:134:__init__] RANK=65 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) > using checkpoint value 6e-05 for learning rate > using checkpoint value 6e-06 for minimum learning rate > using checkpoint value 216320 for warmup iterations > using checkpoint value 126953125 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 8 ZeRO state_dicts for rank 384 successfully loaded 8 ZeRO state_dicts for rank 424 successfully loaded 8 ZeRO state_dicts for rank 444 successfully loaded 8 ZeRO state_dicts for rank 400 successfully loaded 8 ZeRO state_dicts for rank 261 successfully loaded 8 ZeRO state_dicts for rank 432 successfully loaded 8 ZeRO state_dicts for rank 420 successfully loaded 8 ZeRO state_dicts for rank 152 successfully loaded 8 ZeRO state_dicts for rank 440 successfully loaded 8 ZeRO state_dicts for rank 387 successfully loaded 8 ZeRO state_dicts for rank 296 successfully loaded 8 ZeRO state_dicts for rank 392 successfully loaded 8 ZeRO state_dicts for rank 196 successfully loaded 8 ZeRO state_dicts for rank 338 successfully loaded 8 ZeRO state_dicts for rank 379 successfully loaded 8 ZeRO state_dicts for rank 336 loading 8 zero partition checkpoints for rank 384 successfully loaded 8 ZeRO state_dicts for rank 385 successfully loaded 8 ZeRO state_dicts for rank 445 successfully loaded 8 ZeRO state_dicts for rank 84 successfully loaded 8 ZeRO state_dicts for rank 86 successfully loaded 8 ZeRO state_dicts for rank 428 successfully loaded 8 ZeRO state_dicts for rank 337 successfully loaded 8 ZeRO state_dicts for rank 416 successfully loaded 8 ZeRO state_dicts for rank 436 loading 8 zero partition checkpoints for rank 424 successfully loaded 8 ZeRO state_dicts for rank 88 loading 8 zero partition checkpoints for rank 444 successfully loaded 8 ZeRO state_dicts for rank 376 successfully loaded 8 ZeRO state_dicts for rank 125 successfully loaded 8 ZeRO state_dicts for rank 197 successfully loaded 8 ZeRO state_dicts for rank 198 successfully loaded 8 ZeRO state_dicts for rank 388 successfully loaded 8 ZeRO state_dicts for rank 238 successfully loaded 8 ZeRO state_dicts for rank 248 successfully loaded 8 ZeRO state_dicts for rank 174 loading 8 zero partition checkpoints for rank 261 successfully loaded 8 ZeRO state_dicts for rank 250 loading 8 zero partition checkpoints for rank 400 successfully loaded 8 ZeRO state_dicts for rank 192 successfully loaded 8 ZeRO state_dicts for rank 277 successfully loaded 8 ZeRO state_dicts for rank 437 successfully loaded 8 ZeRO state_dicts for rank 204 successfully loaded 8 ZeRO state_dicts for rank 297 loading 8 zero partition checkpoints for rank 432 successfully loaded 8 ZeRO state_dicts for rank 158 successfully loaded 8 ZeRO state_dicts for rank 99 successfully loaded 8 ZeRO state_dicts for rank 194 successfully loaded 8 ZeRO state_dicts for rank 199 successfully loaded 8 ZeRO state_dicts for rank 382 successfully loaded 8 ZeRO state_dicts for rank 332 successfully loaded 8 ZeRO state_dicts for rank 245 successfully loaded 8 ZeRO state_dicts for rank 441 successfully loaded 8 ZeRO state_dicts for rank 299 successfully loaded 8 ZeRO state_dicts for rank 242 successfully loaded 8 ZeRO state_dicts for rank 391 loading 8 zero partition checkpoints for rank 420 successfully loaded 8 ZeRO state_dicts for rank 234 successfully loaded 8 ZeRO state_dicts for rank 380 successfully loaded 8 ZeRO state_dicts for rank 433 successfully loaded 8 ZeRO state_dicts for rank 423 successfully loaded 8 ZeRO state_dicts for rank 425 loading 8 zero partition checkpoints for rank 440 loading 8 zero partition checkpoints for rank 152 successfully loaded 8 ZeRO state_dicts for rank 232 successfully loaded 8 ZeRO state_dicts for rank 246 successfully loaded 8 ZeRO state_dicts for rank 401 successfully loaded 8 ZeRO state_dicts for rank 89 successfully loaded 8 ZeRO state_dicts for rank 241 successfully loaded 8 ZeRO state_dicts for rank 155 successfully loaded 8 ZeRO state_dicts for rank 394 successfully loaded 8 ZeRO state_dicts for rank 178 successfully loaded 8 ZeRO state_dicts for rank 257 successfully loaded 8 ZeRO state_dicts for rank 429 successfully loaded 8 ZeRO state_dicts for rank 422 successfully loaded 8 ZeRO state_dicts for rank 265 successfully loaded 8 ZeRO state_dicts for rank 340 successfully loaded 8 ZeRO state_dicts for rank 256 successfully loaded 8 ZeRO state_dicts for rank 229 successfully loaded 8 ZeRO state_dicts for rank 218 loading 8 zero partition checkpoints for rank 387 successfully loaded 8 ZeRO state_dicts for rank 421 successfully loaded 8 ZeRO state_dicts for rank 121 successfully loaded 8 ZeRO state_dicts for rank 153 successfully loaded 8 ZeRO state_dicts for rank 182 successfully loaded 8 ZeRO state_dicts for rank 447 successfully loaded 8 ZeRO state_dicts for rank 216 successfully loaded 8 ZeRO state_dicts for rank 237 successfully loaded 8 ZeRO state_dicts for rank 403 successfully loaded 8 ZeRO state_dicts for rank 378 successfully loaded 8 ZeRO state_dicts for rank 341 successfully loaded 8 ZeRO state_dicts for rank 389 successfully loaded 8 ZeRO state_dicts for rank 367 successfully loaded 8 ZeRO state_dicts for rank 236 successfully loaded 8 ZeRO state_dicts for rank 292 successfully loaded 8 ZeRO state_dicts for rank 298 successfully loaded 8 ZeRO state_dicts for rank 393 successfully loaded 8 ZeRO state_dicts for rank 126 successfully loaded 8 ZeRO state_dicts for rank 180 successfully loaded 8 ZeRO state_dicts for rank 383 successfully loaded 8 ZeRO state_dicts for rank 446 successfully loaded 8 ZeRO state_dicts for rank 366 successfully loaded 8 ZeRO state_dicts for rank 443 loading 8 zero partition checkpoints for rank 392 successfully loaded 8 ZeRO state_dicts for rank 278 successfully loaded 8 ZeRO state_dicts for rank 96 successfully loaded 8 ZeRO state_dicts for rank 69 successfully loaded 8 ZeRO state_dicts for rank 136 successfully loaded 8 ZeRO state_dicts for rank 386 successfully loaded 8 ZeRO state_dicts for rank 408 loading 8 zero partition checkpoints for rank 338 successfully loaded 8 ZeRO state_dicts for rank 109 loading 8 zero partition checkpoints for rank 196 successfully loaded 8 ZeRO state_dicts for rank 154 successfully loaded 8 ZeRO state_dicts for rank 430 successfully loaded 8 ZeRO state_dicts for rank 342 successfully loaded 8 ZeRO state_dicts for rank 206 successfully loaded 8 ZeRO state_dicts for rank 128 successfully loaded 8 ZeRO state_dicts for rank 123 successfully loaded 8 ZeRO state_dicts for rank 339 successfully loaded 8 ZeRO state_dicts for rank 233 successfully loaded 8 ZeRO state_dicts for rank 235 successfully loaded 8 ZeRO state_dicts for rank 279 successfully loaded 8 ZeRO state_dicts for rank 285 successfully loaded 8 ZeRO state_dicts for rank 219 successfully loaded 8 ZeRO state_dicts for rank 124 successfully loaded 8 ZeRO state_dicts for rank 90 successfully loaded 8 ZeRO state_dicts for rank 249 successfully loaded 8 ZeRO state_dicts for rank 343 successfully loaded 8 ZeRO state_dicts for rank 132 successfully loaded 8 ZeRO state_dicts for rank 150 successfully loaded 8 ZeRO state_dicts for rank 450 successfully loaded 8 ZeRO state_dicts for rank 313 successfully loaded 8 ZeRO state_dicts for rank 293 successfully loaded 8 ZeRO state_dicts for rank 381 successfully loaded 8 ZeRO state_dicts for rank 364 successfully loaded 8 ZeRO state_dicts for rank 251 successfully loaded 8 ZeRO state_dicts for rank 65 loading 8 zero partition checkpoints for rank 379 successfully loaded 8 ZeRO state_dicts for rank 266 successfully loaded 8 ZeRO state_dicts for rank 365 loading 8 zero partition checkpoints for rank 296 successfully loaded 8 ZeRO state_dicts for rank 442 successfully loaded 8 ZeRO state_dicts for rank 243 successfully loaded 8 ZeRO state_dicts for rank 431 successfully loaded 8 ZeRO state_dicts for rank 276 successfully loaded 8 ZeRO state_dicts for rank 175 successfully loaded 8 ZeRO state_dicts for rank 435 successfully loaded 8 ZeRO state_dicts for rank 309 loading 8 zero partition checkpoints for rank 336 successfully loaded 8 ZeRO state_dicts for rank 335 successfully loaded 8 ZeRO state_dicts for rank 172 successfully loaded 8 ZeRO state_dicts for rank 412 successfully loaded 8 ZeRO state_dicts for rank 217 successfully loaded 8 ZeRO state_dicts for rank 438 successfully loaded 8 ZeRO state_dicts for rank 426 successfully loaded 8 ZeRO state_dicts for rank 317 successfully loaded 8 ZeRO state_dicts for rank 176 successfully loaded 8 ZeRO state_dicts for rank 260 successfully loaded 8 ZeRO state_dicts for rank 240 successfully loaded 8 ZeRO state_dicts for rank 143 successfully loaded 8 ZeRO state_dicts for rank 120 successfully loaded 8 ZeRO state_dicts for rank 354 successfully loaded 8 ZeRO state_dicts for rank 239 successfully loaded 8 ZeRO state_dicts for rank 228 successfully loaded 8 ZeRO state_dicts for rank 193 successfully loaded 8 ZeRO state_dicts for rank 289 successfully loaded 8 ZeRO state_dicts for rank 70 successfully loaded 8 ZeRO state_dicts for rank 247 successfully loaded 8 ZeRO state_dicts for rank 63 successfully loaded 8 ZeRO state_dicts for rank 139 successfully loaded 8 ZeRO state_dicts for rank 439 loading 8 zero partition checkpoints for rank 84 loading 8 zero partition checkpoints for rank 385 successfully loaded 8 ZeRO state_dicts for rank 173 successfully loaded 8 ZeRO state_dicts for rank 396 successfully loaded 8 ZeRO state_dicts for rank 355 successfully loaded 8 ZeRO state_dicts for rank 141 successfully loaded 8 ZeRO state_dicts for rank 8 successfully loaded 8 ZeRO state_dicts for rank 164 successfully loaded 8 ZeRO state_dicts for rank 148 successfully loaded 8 ZeRO state_dicts for rank 177 successfully loaded 8 ZeRO state_dicts for rank 312 successfully loaded 8 ZeRO state_dicts for rank 244 successfully loaded 8 ZeRO state_dicts for rank 252 successfully loaded 8 ZeRO state_dicts for rank 369 successfully loaded 8 ZeRO state_dicts for rank 149 successfully loaded 8 ZeRO state_dicts for rank 351 successfully loaded 8 ZeRO state_dicts for rank 167 loading 8 zero partition checkpoints for rank 337 successfully loaded 8 ZeRO state_dicts for rank 79 successfully loaded 8 ZeRO state_dicts for rank 334 successfully loaded 8 ZeRO state_dicts for rank 390 successfully loaded 8 ZeRO state_dicts for rank 427 successfully loaded 8 ZeRO state_dicts for rank 122 successfully loaded 8 ZeRO state_dicts for rank 156 successfully loaded 8 ZeRO state_dicts for rank 64 successfully loaded 8 ZeRO state_dicts for rank 97 loading 8 zero partition checkpoints for rank 436 successfully loaded 8 ZeRO state_dicts for rank 263 successfully loaded 8 ZeRO state_dicts for rank 142 successfully loaded 8 ZeRO state_dicts for rank 68 successfully loaded 8 ZeRO state_dicts for rank 157 successfully loaded 8 ZeRO state_dicts for rank 377 successfully loaded 8 ZeRO state_dicts for rank 352 loading 8 zero partition checkpoints for rank 376 successfully loaded 8 ZeRO state_dicts for rank 195 successfully loaded 8 ZeRO state_dicts for rank 231 successfully loaded 8 ZeRO state_dicts for rank 291 successfully loaded 8 ZeRO state_dicts for rank 77 loading 8 zero partition checkpoints for rank 445 loading 8 zero partition checkpoints for rank 428 successfully loaded 8 ZeRO state_dicts for rank 290 loading 8 zero partition checkpoints for rank 416 successfully loaded 8 ZeRO state_dicts for rank 127 successfully loaded 8 ZeRO state_dicts for rank 137 successfully loaded 8 ZeRO state_dicts for rank 61 successfully loaded 8 ZeRO state_dicts for rank 105 successfully loaded 8 ZeRO state_dicts for rank 62 successfully loaded 8 ZeRO state_dicts for rank 414 successfully loaded 8 ZeRO state_dicts for rank 212 successfully loaded 8 ZeRO state_dicts for rank 262 successfully loaded 8 ZeRO state_dicts for rank 468 successfully loaded 8 ZeRO state_dicts for rank 395 loading 8 zero partition checkpoints for rank 198 loading 8 zero partition checkpoints for rank 388 successfully loaded 8 ZeRO state_dicts for rank 87 successfully loaded 8 ZeRO state_dicts for rank 253 loading 8 zero partition checkpoints for rank 88 successfully loaded 8 ZeRO state_dicts for rank 189 successfully loaded 8 ZeRO state_dicts for rank 205 successfully loaded 8 ZeRO state_dicts for rank 166 successfully loaded 8 ZeRO state_dicts for rank 404 successfully loaded 8 ZeRO state_dicts for rank 417 successfully loaded 8 ZeRO state_dicts for rank 130 successfully loaded 8 ZeRO state_dicts for rank 288 successfully loaded 8 ZeRO state_dicts for rank 159 successfully loaded 8 ZeRO state_dicts for rank 179 successfully loaded 8 ZeRO state_dicts for rank 60 successfully loaded 8 ZeRO state_dicts for rank 402 successfully loaded 8 ZeRO state_dicts for rank 349 successfully loaded 8 ZeRO state_dicts for rank 188 successfully loaded 8 ZeRO state_dicts for rank 410 successfully loaded 8 ZeRO state_dicts for rank 220 successfully loaded 8 ZeRO state_dicts for rank 101 successfully loaded 8 ZeRO state_dicts for rank 398 successfully loaded 8 ZeRO state_dicts for rank 281 successfully loaded 8 ZeRO state_dicts for rank 254 successfully loaded 8 ZeRO state_dicts for rank 474 successfully loaded 8 ZeRO state_dicts for rank 333 successfully loaded 8 ZeRO state_dicts for rank 358 successfully loaded 8 ZeRO state_dicts for rank 363 successfully loaded 8 ZeRO state_dicts for rank 184 successfully loaded 8 ZeRO state_dicts for rank 82 successfully loaded 8 ZeRO state_dicts for rank 80 successfully loaded 8 ZeRO state_dicts for rank 471 successfully loaded 8 ZeRO state_dicts for rank 453 successfully loaded 8 ZeRO state_dicts for rank 345 successfully loaded 8 ZeRO state_dicts for rank 81 successfully loaded 8 ZeRO state_dicts for rank 76 successfully loaded 8 ZeRO state_dicts for rank 85 successfully loaded 8 ZeRO state_dicts for rank 434 successfully loaded 8 ZeRO state_dicts for rank 267 successfully loaded 8 ZeRO state_dicts for rank 230 loading 8 zero partition checkpoints for rank 197 successfully loaded 8 ZeRO state_dicts for rank 295 successfully loaded 8 ZeRO state_dicts for rank 353 loading 8 zero partition checkpoints for rank 437 successfully loaded 8 ZeRO state_dicts for rank 273 successfully loaded 8 ZeRO state_dicts for rank 202 successfully loaded 8 ZeRO state_dicts for rank 36 successfully loaded 8 ZeRO state_dicts for rank 470 successfully loaded 8 ZeRO state_dicts for rank 357 successfully loaded 8 ZeRO state_dicts for rank 151 successfully loaded 8 ZeRO state_dicts for rank 301 successfully loaded 8 ZeRO state_dicts for rank 315 loading 8 zero partition checkpoints for rank 174 successfully loaded 8 ZeRO state_dicts for rank 209 successfully loaded 8 ZeRO state_dicts for rank 113 successfully loaded 8 ZeRO state_dicts for rank 160 loading 8 zero partition checkpoints for rank 125 successfully loaded 8 ZeRO state_dicts for rank 201 successfully loaded 8 ZeRO state_dicts for rank 104 loading 8 zero partition checkpoints for rank 248 successfully loaded 8 ZeRO state_dicts for rank 370 successfully loaded 8 ZeRO state_dicts for rank 311 successfully loaded 8 ZeRO state_dicts for rank 11 successfully loaded 8 ZeRO state_dicts for rank 478 successfully loaded 8 ZeRO state_dicts for rank 227 successfully loaded 8 ZeRO state_dicts for rank 183 successfully loaded 8 ZeRO state_dicts for rank 272 successfully loaded 8 ZeRO state_dicts for rank 255 loading 8 zero partition checkpoints for rank 194 successfully loaded 8 ZeRO state_dicts for rank 9 loading 8 zero partition checkpoints for rank 204 successfully loaded 8 ZeRO state_dicts for rank 93 successfully loaded 8 ZeRO state_dicts for rank 399 successfully loaded 8 ZeRO state_dicts for rank 451 successfully loaded 8 ZeRO state_dicts for rank 168 successfully loaded 8 ZeRO state_dicts for rank 200 successfully loaded 8 ZeRO state_dicts for rank 316 loading 8 zero partition checkpoints for rank 158 successfully loaded 8 ZeRO state_dicts for rank 91 successfully loaded 8 ZeRO state_dicts for rank 73 loading 8 zero partition checkpoints for rank 441 successfully loaded 8 ZeRO state_dicts for rank 418 successfully loaded 8 ZeRO state_dicts for rank 448 successfully loaded 8 ZeRO state_dicts for rank 187 successfully loaded 8 ZeRO state_dicts for rank 356 successfully loaded 8 ZeRO state_dicts for rank 269 loading 8 zero partition checkpoints for rank 299 successfully loaded 8 ZeRO state_dicts for rank 131 successfully loaded 8 ZeRO state_dicts for rank 361 loading 8 zero partition checkpoints for rank 277 successfully loaded 8 ZeRO state_dicts for rank 39 successfully loaded 8 ZeRO state_dicts for rank 350 loading 8 zero partition checkpoints for rank 391 loading 8 zero partition checkpoints for rank 297 successfully loaded 8 ZeRO state_dicts for rank 107 loading 8 zero partition checkpoints for rank 234 loading 8 zero partition checkpoints for rank 242 successfully loaded 8 ZeRO state_dicts for rank 318 successfully loaded 8 ZeRO state_dicts for rank 373 successfully loaded 8 ZeRO state_dicts for rank 475 successfully loaded 8 ZeRO state_dicts for rank 103 successfully loaded 8 ZeRO state_dicts for rank 472 successfully loaded 8 ZeRO state_dicts for rank 221 successfully loaded 8 ZeRO state_dicts for rank 210 loading 8 zero partition checkpoints for rank 192 successfully loaded 8 ZeRO state_dicts for rank 368 successfully loaded 8 ZeRO state_dicts for rank 140 successfully loaded 8 ZeRO state_dicts for rank 268 successfully loaded 8 ZeRO state_dicts for rank 456 successfully loaded 8 ZeRO state_dicts for rank 455 successfully loaded 8 ZeRO state_dicts for rank 321 successfully loaded 8 ZeRO state_dicts for rank 462 successfully loaded 8 ZeRO state_dicts for rank 284 successfully loaded 8 ZeRO state_dicts for rank 117 successfully loaded 8 ZeRO state_dicts for rank 41 loading 8 zero partition checkpoints for rank 394 successfully loaded 8 ZeRO state_dicts for rank 359 successfully loaded 8 ZeRO state_dicts for rank 375 successfully loaded 8 ZeRO state_dicts for rank 215 successfully loaded 8 ZeRO state_dicts for rank 181 loading 8 zero partition checkpoints for rank 423 successfully loaded 8 ZeRO state_dicts for rank 10 successfully loaded 8 ZeRO state_dicts for rank 100 successfully loaded 8 ZeRO state_dicts for rank 191 loading 8 zero partition checkpoints for rank 178 successfully loaded 8 ZeRO state_dicts for rank 294 loading 8 zero partition checkpoints for rank 332 successfully loaded 8 ZeRO state_dicts for rank 207 successfully loaded 8 ZeRO state_dicts for rank 371 loading 8 zero partition checkpoints for rank 401 successfully loaded 8 ZeRO state_dicts for rank 203 successfully loaded 8 ZeRO state_dicts for rank 37 successfully loaded 8 ZeRO state_dicts for rank 324 loading 8 zero partition checkpoints for rank 241 loading 8 zero partition checkpoints for rank 422 loading 8 zero partition checkpoints for rank 199 successfully loaded 8 ZeRO state_dicts for rank 35 successfully loaded 8 ZeRO state_dicts for rank 322 successfully loaded 8 ZeRO state_dicts for rank 258 successfully loaded 8 ZeRO state_dicts for rank 329 successfully loaded 8 ZeRO state_dicts for rank 222 successfully loaded 8 ZeRO state_dicts for rank 460 loading 8 zero partition checkpoints for rank 380 loading 8 zero partition checkpoints for rank 421 successfully loaded 8 ZeRO state_dicts for rank 323 loading 8 zero partition checkpoints for rank 256 loading 8 zero partition checkpoints for rank 433 loading 8 zero partition checkpoints for rank 229 successfully loaded 8 ZeRO state_dicts for rank 302 loading 8 zero partition checkpoints for rank 265 successfully loaded 8 ZeRO state_dicts for rank 74 successfully loaded 8 ZeRO state_dicts for rank 144 successfully loaded 8 ZeRO state_dicts for rank 223 successfully loaded 8 ZeRO state_dicts for rank 225 loading 8 zero partition checkpoints for rank 153 successfully loaded 8 ZeRO state_dicts for rank 72 successfully loaded 8 ZeRO state_dicts for rank 138 successfully loaded 8 ZeRO state_dicts for rank 190 loading 8 zero partition checkpoints for rank 246 successfully loaded 8 ZeRO state_dicts for rank 118 successfully loaded 8 ZeRO state_dicts for rank 406 successfully loaded 8 ZeRO state_dicts for rank 413 successfully loaded 8 ZeRO state_dicts for rank 397 successfully loaded 8 ZeRO state_dicts for rank 264 loading 8 zero partition checkpoints for rank 429 successfully loaded 8 ZeRO state_dicts for rank 275 loading 8 zero partition checkpoints for rank 237 loading 8 zero partition checkpoints for rank 403 loading 8 zero partition checkpoints for rank 378 loading 8 zero partition checkpoints for rank 232 successfully loaded 8 ZeRO state_dicts for rank 71 loading 8 zero partition checkpoints for rank 257 loading 8 zero partition checkpoints for rank 389 successfully loaded 8 ZeRO state_dicts for rank 115 successfully loaded 8 ZeRO state_dicts for rank 111 successfully loaded 8 ZeRO state_dicts for rank 108 successfully loaded 8 ZeRO state_dicts for rank 66 successfully loaded 8 ZeRO state_dicts for rank 213 successfully loaded 8 ZeRO state_dicts for rank 186 successfully loaded 8 ZeRO state_dicts for rank 43 successfully loaded 8 ZeRO state_dicts for rank 304 successfully loaded 8 ZeRO state_dicts for rank 211 loading 8 zero partition checkpoints for rank 393 successfully loaded 8 ZeRO state_dicts for rank 347 loading 8 zero partition checkpoints for rank 443 loading 8 zero partition checkpoints for rank 386 successfully loaded 8 ZeRO state_dicts for rank 314 successfully loaded 8 ZeRO state_dicts for rank 208 successfully loaded 8 ZeRO state_dicts for rank 459 successfully loaded 8 ZeRO state_dicts for rank 165 successfully loaded 8 ZeRO state_dicts for rank 419 loading 8 zero partition checkpoints for rank 278 successfully loaded 8 ZeRO state_dicts for rank 83 successfully loaded 8 ZeRO state_dicts for rank 362 loading 8 zero partition checkpoints for rank 367 loading 8 zero partition checkpoints for rank 180 successfully loaded 8 ZeRO state_dicts for rank 163 successfully loaded 8 ZeRO state_dicts for rank 214 successfully loaded 8 ZeRO state_dicts for rank 116 successfully loaded 8 ZeRO state_dicts for rank 303 successfully loaded 8 ZeRO state_dicts for rank 374 loading 8 zero partition checkpoints for rank 126 loading 8 zero partition checkpoints for rank 339 successfully loaded 8 ZeRO state_dicts for rank 274 loading 8 zero partition checkpoints for rank 292 loading 8 zero partition checkpoints for rank 128 successfully loaded 8 ZeRO state_dicts for rank 114 loading 8 zero partition checkpoints for rank 206 successfully loaded 8 ZeRO state_dicts for rank 372 successfully loaded 8 ZeRO state_dicts for rank 449 successfully loaded 8 ZeRO state_dicts for rank 40 successfully loaded 8 ZeRO state_dicts for rank 409 loading 8 zero partition checkpoints for rank 69 loading 8 zero partition checkpoints for rank 298 loading 8 zero partition checkpoints for rank 123 successfully loaded 8 ZeRO state_dicts for rank 3 loading 8 zero partition checkpoints for rank 366 loading 8 zero partition checkpoints for rank 279 successfully loaded 8 ZeRO state_dicts for rank 47 successfully loaded 8 ZeRO state_dicts for rank 259 successfully loaded 8 ZeRO state_dicts for rank 479 loading 8 zero partition checkpoints for rank 235 successfully loaded 8 ZeRO state_dicts for rank 67 loading 8 zero partition checkpoints for rank 447 successfully loaded 8 ZeRO state_dicts for rank 95 successfully loaded 8 ZeRO state_dicts for rank 270 loading 8 zero partition checkpoints for rank 96 successfully loaded 8 ZeRO state_dicts for rank 129 successfully loaded 8 ZeRO state_dicts for rank 75 successfully loaded 8 ZeRO state_dicts for rank 466 successfully loaded 8 ZeRO state_dicts for rank 226 loading 8 zero partition checkpoints for rank 216 successfully loaded 8 ZeRO state_dicts for rank 224 successfully loaded 8 ZeRO state_dicts for rank 280 loading 8 zero partition checkpoints for rank 285 loading 8 zero partition checkpoints for rank 341 successfully loaded 8 ZeRO state_dicts for rank 92 loading 8 zero partition checkpoints for rank 251 successfully loaded 8 ZeRO state_dicts for rank 29 successfully loaded 8 ZeRO state_dicts for rank 411 successfully loaded 8 ZeRO state_dicts for rank 507 loading 8 zero partition checkpoints for rank 408 successfully loaded 8 ZeRO state_dicts for rank 171 loading 8 zero partition checkpoints for rank 446 successfully loaded 8 ZeRO state_dicts for rank 146 loading 8 zero partition checkpoints for rank 340 loading 8 zero partition checkpoints for rank 245 loading 8 zero partition checkpoints for rank 430 successfully loaded 8 ZeRO state_dicts for rank 327 successfully loaded 8 ZeRO state_dicts for rank 331 loading 8 zero partition checkpoints for rank 381 loading 8 zero partition checkpoints for rank 364 successfully loaded 8 ZeRO state_dicts for rank 20 loading 8 zero partition checkpoints for rank 132 successfully loaded 8 ZeRO state_dicts for rank 282 loading 8 zero partition checkpoints for rank 233 loading 8 zero partition checkpoints for rank 243 successfully loaded 8 ZeRO state_dicts for rank 452 loading 8 zero partition checkpoints for rank 431 successfully loaded 8 ZeRO state_dicts for rank 305 successfully loaded 8 ZeRO state_dicts for rank 21 successfully loaded 8 ZeRO state_dicts for rank 169 loading 8 zero partition checkpoints for rank 86 loading 8 zero partition checkpoints for rank 442 successfully loaded 8 ZeRO state_dicts for rank 330 loading 8 zero partition checkpoints for rank 124 successfully loaded 8 ZeRO state_dicts for rank 286 loading 8 zero partition checkpoints for rank 175 successfully loaded 8 ZeRO state_dicts for rank 326 successfully loaded 8 ZeRO state_dicts for rank 454 loading 8 zero partition checkpoints for rank 155 successfully loaded 8 ZeRO state_dicts for rank 476 successfully loaded 8 ZeRO state_dicts for rank 102 successfully loaded 8 ZeRO state_dicts for rank 300 loading 8 zero partition checkpoints for rank 250 successfully loaded 8 ZeRO state_dicts for rank 1 loading 8 zero partition checkpoints for rank 435 successfully loaded 8 ZeRO state_dicts for rank 15 successfully loaded 8 ZeRO state_dicts for rank 38 successfully loaded 8 ZeRO state_dicts for rank 328 successfully loaded 8 ZeRO state_dicts for rank 0 loading 8 zero partition checkpoints for rank 172 successfully loaded 8 ZeRO state_dicts for rank 463 loading 8 zero partition checkpoints for rank 219 successfully loaded 8 ZeRO state_dicts for rank 320 loading 8 zero partition checkpoints for rank 218 successfully loaded 8 ZeRO state_dicts for rank 56 successfully loaded 8 ZeRO state_dicts for rank 271 loading 8 zero partition checkpoints for rank 150 successfully loaded 8 ZeRO state_dicts for rank 287 loading 8 zero partition checkpoints for rank 309 successfully loaded 8 ZeRO state_dicts for rank 19 successfully loaded 8 ZeRO state_dicts for rank 24 successfully loaded 8 ZeRO state_dicts for rank 27 successfully loaded 8 ZeRO state_dicts for rank 112 successfully loaded 8 ZeRO state_dicts for rank 415 successfully loaded 8 ZeRO state_dicts for rank 310 loading 8 zero partition checkpoints for rank 365 loading 8 zero partition checkpoints for rank 240 successfully loaded 8 ZeRO state_dicts for rank 78 loading 8 zero partition checkpoints for rank 260 loading 8 zero partition checkpoints for rank 342 loading 8 zero partition checkpoints for rank 313 loading 8 zero partition checkpoints for rank 438 successfully loaded 8 ZeRO state_dicts for rank 161 successfully loaded 8 ZeRO state_dicts for rank 308 successfully loaded 8 ZeRO state_dicts for rank 32 successfully loaded 8 ZeRO state_dicts for rank 344 loading 8 zero partition checkpoints for rank 289 successfully loaded 8 ZeRO state_dicts for rank 185 successfully loaded 8 ZeRO state_dicts for rank 49 successfully loaded 8 ZeRO state_dicts for rank 57 successfully loaded 8 ZeRO state_dicts for rank 484 successfully loaded 8 ZeRO state_dicts for rank 487 loading 8 zero partition checkpoints for rank 252 successfully loaded 8 ZeRO state_dicts for rank 348 loading 8 zero partition checkpoints for rank 239 successfully loaded 8 ZeRO state_dicts for rank 25 loading 8 zero partition checkpoints for rank 120 loading 8 zero partition checkpoints for rank 276 loading 8 zero partition checkpoints for rank 425 loading 8 zero partition checkpoints for rank 382 loading 8 zero partition checkpoints for rank 173 successfully loaded 8 ZeRO state_dicts for rank 494 successfully loaded 8 ZeRO state_dicts for rank 162 successfully loaded 8 ZeRO state_dicts for rank 135 successfully loaded 8 ZeRO state_dicts for rank 504 successfully loaded 8 ZeRO state_dicts for rank 407 successfully loaded 8 ZeRO state_dicts for rank 13 loading 8 zero partition checkpoints for rank 247 loading 8 zero partition checkpoints for rank 390 successfully loaded 8 ZeRO state_dicts for rank 319 loading 8 zero partition checkpoints for rank 377 loading 8 zero partition checkpoints for rank 64 loading 8 zero partition checkpoints for rank 351 successfully loaded 8 ZeRO state_dicts for rank 48 successfully loaded 8 ZeRO state_dicts for rank 306 loading 8 zero partition checkpoints for rank 412 loading 8 zero partition checkpoints for rank 195 loading 8 zero partition checkpoints for rank 369 loading 8 zero partition checkpoints for rank 439 loading 8 zero partition checkpoints for rank 121 loading 8 zero partition checkpoints for rank 343 successfully loaded 8 ZeRO state_dicts for rank 405 successfully loaded 8 ZeRO state_dicts for rank 106 loading 8 zero partition checkpoints for rank 154 loading 8 zero partition checkpoints for rank 396 loading 8 zero partition checkpoints for rank 167 loading 8 zero partition checkpoints for rank 231 loading 8 zero partition checkpoints for rank 352 loading 8 zero partition checkpoints for rank 238 successfully loaded 8 ZeRO state_dicts for rank 283 loading 8 zero partition checkpoints for rank 182 loading 8 zero partition checkpoints for rank 137 successfully loaded 8 ZeRO state_dicts for rank 346 successfully loaded 8 ZeRO state_dicts for rank 170 loading 8 zero partition checkpoints for rank 217 loading 8 zero partition checkpoints for rank 193 loading 8 zero partition checkpoints for rank 141 successfully loaded 8 ZeRO state_dicts for rank 33 loading 8 zero partition checkpoints for rank 263 successfully loaded 8 ZeRO state_dicts for rank 145 successfully loaded 8 ZeRO state_dicts for rank 473 successfully loaded 8 ZeRO state_dicts for rank 98 loading 8 zero partition checkpoints for rank 149 loading 8 zero partition checkpoints for rank 136 successfully loaded 8 ZeRO state_dicts for rank 360 loading 8 zero partition checkpoints for rank 177 successfully loaded 8 ZeRO state_dicts for rank 510 loading 8 zero partition checkpoints for rank 249 successfully loaded 8 ZeRO state_dicts for rank 490 successfully loaded 8 ZeRO state_dicts for rank 52 successfully loaded 8 ZeRO state_dicts for rank 325 successfully loaded 8 ZeRO state_dicts for rank 133 loading 8 zero partition checkpoints for rank 262 successfully loaded 8 ZeRO state_dicts for rank 506 loading 8 zero partition checkpoints for rank 395 successfully loaded 8 ZeRO state_dicts for rank 147 loading 8 zero partition checkpoints for rank 99 loading 8 zero partition checkpoints for rank 312 loading 8 zero partition checkpoints for rank 63 successfully loaded 8 ZeRO state_dicts for rank 458 loading 8 zero partition checkpoints for rank 127 successfully loaded 8 ZeRO state_dicts for rank 119 successfully loaded 8 ZeRO state_dicts for rank 134 loading 8 zero partition checkpoints for rank 164 loading 8 zero partition checkpoints for rank 205 loading 8 zero partition checkpoints for rank 212 loading 8 zero partition checkpoints for rank 288 loading 8 zero partition checkpoints for rank 335 loading 8 zero partition checkpoints for rank 383 loading 8 zero partition checkpoints for rank 189 loading 8 zero partition checkpoints for rank 166 successfully loaded 8 ZeRO state_dicts for rank 51 loading 8 zero partition checkpoints for rank 236 loading 8 zero partition checkpoints for rank 77 loading 8 zero partition checkpoints for rank 188 loading 8 zero partition checkpoints for rank 410 successfully loaded 8 ZeRO state_dicts for rank 55 loading 8 zero partition checkpoints for rank 148 loading 8 zero partition checkpoints for rank 417 loading 8 zero partition checkpoints for rank 105 successfully loaded 8 ZeRO state_dicts for rank 457 successfully loaded 8 ZeRO state_dicts for rank 2 successfully loaded 8 ZeRO state_dicts for rank 42 loading 8 zero partition checkpoints for rank 402 successfully loaded 8 ZeRO state_dicts for rank 500 loading 8 zero partition checkpoints for rank 434 successfully loaded 8 ZeRO state_dicts for rank 94 loading 8 zero partition checkpoints for rank 82 loading 8 zero partition checkpoints for rank 358 successfully loaded 8 ZeRO state_dicts for rank 491 loading 8 zero partition checkpoints for rank 61 successfully loaded 8 ZeRO state_dicts for rank 16 successfully loaded 8 ZeRO state_dicts for rank 58 loading 8 zero partition checkpoints for rank 65 loading 8 zero partition checkpoints for rank 349 loading 8 zero partition checkpoints for rank 295 successfully loaded 8 ZeRO state_dicts for rank 17 successfully loaded 8 ZeRO state_dicts for rank 464 loading 8 zero partition checkpoints for rank 404 loading 8 zero partition checkpoints for rank 363 successfully loaded 8 ZeRO state_dicts for rank 499 successfully loaded 8 ZeRO state_dicts for rank 461 successfully loaded 8 ZeRO state_dicts for rank 44 successfully loaded 8 ZeRO state_dicts for rank 50 successfully loaded 8 ZeRO state_dicts for rank 477 successfully loaded 8 ZeRO state_dicts for rank 45 loading 8 zero partition checkpoints for rank 159 successfully loaded 8 ZeRO state_dicts for rank 30 loading 8 zero partition checkpoints for rank 345 successfully loaded 8 ZeRO state_dicts for rank 53 loading 8 zero partition checkpoints for rank 353 successfully loaded 8 ZeRO state_dicts for rank 23 loading 8 zero partition checkpoints for rank 334 loading 8 zero partition checkpoints for rank 315 loading 8 zero partition checkpoints for rank 333 loading 8 zero partition checkpoints for rank 273 loading 8 zero partition checkpoints for rank 8 successfully loaded 8 ZeRO state_dicts for rank 503 successfully loaded 8 ZeRO state_dicts for rank 12 loading 8 zero partition checkpoints for rank 244 loading 8 zero partition checkpoints for rank 151 loading 8 zero partition checkpoints for rank 370 successfully loaded 8 ZeRO state_dicts for rank 59 successfully loaded 8 ZeRO state_dicts for rank 31 loading 8 zero partition checkpoints for rank 311 loading 8 zero partition checkpoints for rank 426 successfully loaded 8 ZeRO state_dicts for rank 486 loading 8 zero partition checkpoints for rank 399 successfully loaded 8 ZeRO state_dicts for rank 26 loading 8 zero partition checkpoints for rank 474 loading 8 zero partition checkpoints for rank 200 successfully loaded 8 ZeRO state_dicts for rank 54 loading 8 zero partition checkpoints for rank 101 successfully loaded 8 ZeRO state_dicts for rank 46 loading 8 zero partition checkpoints for rank 139 successfully loaded 8 ZeRO state_dicts for rank 498 successfully loaded 8 ZeRO state_dicts for rank 307 loading 8 zero partition checkpoints for rank 471 successfully loaded 8 ZeRO state_dicts for rank 469 successfully loaded 8 ZeRO state_dicts for rank 495 successfully loaded 8 ZeRO state_dicts for rank 22 loading 8 zero partition checkpoints for rank 104 loading 8 zero partition checkpoints for rank 272 successfully loaded 8 ZeRO state_dicts for rank 28 loading 8 zero partition checkpoints for rank 91 loading 8 zero partition checkpoints for rank 160 loading 8 zero partition checkpoints for rank 354 loading 8 zero partition checkpoints for rank 267 successfully loaded 8 ZeRO state_dicts for rank 467 loading 8 zero partition checkpoints for rank 317 loading 8 zero partition checkpoints for rank 361 loading 8 zero partition checkpoints for rank 281 loading 8 zero partition checkpoints for rank 73 loading 8 zero partition checkpoints for rank 103 loading 8 zero partition checkpoints for rank 107 successfully loaded 8 ZeRO state_dicts for rank 34 loading 8 zero partition checkpoints for rank 202 loading 8 zero partition checkpoints for rank 140 successfully loaded 8 ZeRO state_dicts for rank 14 loading 8 zero partition checkpoints for rank 255 successfully loaded 8 ZeRO state_dicts for rank 482 loading 8 zero partition checkpoints for rank 293 loading 8 zero partition checkpoints for rank 220 loading 8 zero partition checkpoints for rank 368 loading 8 zero partition checkpoints for rank 201 successfully loaded 8 ZeRO state_dicts for rank 483 loading 8 zero partition checkpoints for rank 269 loading 8 zero partition checkpoints for rank 355 loading 8 zero partition checkpoints for rank 168 loading 8 zero partition checkpoints for rank 427 loading 8 zero partition checkpoints for rank 318 loading 8 zero partition checkpoints for rank 284 loading 8 zero partition checkpoints for rank 122 loading 8 zero partition checkpoints for rank 93 loading 8 zero partition checkpoints for rank 418 loading 8 zero partition checkpoints for rank 191 loading 8 zero partition checkpoints for rank 203 loading 8 zero partition checkpoints for rank 359 loading 8 zero partition checkpoints for rank 291 loading 8 zero partition checkpoints for rank 207 loading 8 zero partition checkpoints for rank 268 loading 8 zero partition checkpoints for rank 316 loading 8 zero partition checkpoints for rank 187 loading 8 zero partition checkpoints for rank 371 loading 8 zero partition checkpoints for rank 131 loading 8 zero partition checkpoints for rank 97 successfully loaded 8 ZeRO state_dicts for rank 502 loading 8 zero partition checkpoints for rank 398 loading 8 zero partition checkpoints for rank 156 successfully loaded 8 ZeRO state_dicts for rank 18 successfully loaded 8 ZeRO state_dicts for rank 508 loading 8 zero partition checkpoints for rank 215 loading 8 zero partition checkpoints for rank 290 successfully loaded 8 ZeRO state_dicts for rank 497 successfully loaded 8 ZeRO state_dicts for rank 496 loading 8 zero partition checkpoints for rank 117 loading 8 zero partition checkpoints for rank 138 successfully loaded 8 ZeRO state_dicts for rank 493 loading 8 zero partition checkpoints for rank 79 loading 8 zero partition checkpoints for rank 181 loading 8 zero partition checkpoints for rank 209 successfully loaded 8 ZeRO state_dicts for rank 488 successfully loaded 8 ZeRO state_dicts for rank 485 loading 8 zero partition checkpoints for rank 89 loading 8 zero partition checkpoints for rank 157 successfully loaded 8 ZeRO state_dicts for rank 489 successfully loaded 8 ZeRO state_dicts for rank 501 loading 8 zero partition checkpoints for rank 176 loading 8 zero partition checkpoints for rank 468 loading 8 zero partition checkpoints for rank 143 loading 8 zero partition checkpoints for rank 100 loading 8 zero partition checkpoints for rank 223 loading 8 zero partition checkpoints for rank 87 loading 8 zero partition checkpoints for rank 74 loading 8 zero partition checkpoints for rank 258 successfully loaded 8 ZeRO state_dicts for rank 480 loading 8 zero partition checkpoints for rank 406 loading 8 zero partition checkpoints for rank 183 loading 8 zero partition checkpoints for rank 190 loading 8 zero partition checkpoints for rank 275 loading 8 zero partition checkpoints for rank 71 loading 8 zero partition checkpoints for rank 85 loading 8 zero partition checkpoints for rank 329 successfully loaded 8 ZeRO state_dicts for rank 511 loading 8 zero partition checkpoints for rank 72 successfully loaded 8 ZeRO state_dicts for rank 492 loading 8 zero partition checkpoints for rank 211 loading 8 zero partition checkpoints for rank 357 loading 8 zero partition checkpoints for rank 321 loading 8 zero partition checkpoints for rank 322 loading 8 zero partition checkpoints for rank 118 loading 8 zero partition checkpoints for rank 113 loading 8 zero partition checkpoints for rank 142 loading 8 zero partition checkpoints for rank 213 loading 8 zero partition checkpoints for rank 478 loading 8 zero partition checkpoints for rank 460 successfully loaded 8 ZeRO state_dicts for rank 509 loading 8 zero partition checkpoints for rank 186 loading 8 zero partition checkpoints for rank 253 loading 8 zero partition checkpoints for rank 228 loading 8 zero partition checkpoints for rank 419 successfully loaded 8 ZeRO state_dicts for rank 505 loading 8 zero partition checkpoints for rank 266 loading 8 zero partition checkpoints for rank 413 loading 8 zero partition checkpoints for rank 254 loading 8 zero partition checkpoints for rank 470 loading 8 zero partition checkpoints for rank 9 loading 8 zero partition checkpoints for rank 11 loading 8 zero partition checkpoints for rank 184 loading 8 zero partition checkpoints for rank 274 loading 8 zero partition checkpoints for rank 324 loading 8 zero partition checkpoints for rank 314 loading 8 zero partition checkpoints for rank 362 loading 8 zero partition checkpoints for rank 294 loading 8 zero partition checkpoints for rank 90 loading 8 zero partition checkpoints for rank 409 loading 8 zero partition checkpoints for rank 41 loading 8 zero partition checkpoints for rank 450 loading 8 zero partition checkpoints for rank 448 loading 8 zero partition checkpoints for rank 259 loading 8 zero partition checkpoints for rank 179 loading 8 zero partition checkpoints for rank 270 loading 8 zero partition checkpoints for rank 356 loading 8 zero partition checkpoints for rank 165 successfully loaded 8 ZeRO state_dicts for rank 465 loading 8 zero partition checkpoints for rank 214 loading 8 zero partition checkpoints for rank 221 loading 8 zero partition checkpoints for rank 83 loading 8 zero partition checkpoints for rank 76 loading 8 zero partition checkpoints for rank 414 loading 8 zero partition checkpoints for rank 95 loading 8 zero partition checkpoints for rank 114 successfully loaded 8 ZeRO state_dicts for rank 110 loading 8 zero partition checkpoints for rank 37 loading 8 zero partition checkpoints for rank 116 loading 8 zero partition checkpoints for rank 10 loading 8 zero partition checkpoints for rank 411 successfully loaded 8 ZeRO state_dicts for rank 481 loading 8 zero partition checkpoints for rank 331 loading 8 zero partition checkpoints for rank 397 loading 8 zero partition checkpoints for rank 222 loading 8 zero partition checkpoints for rank 230 loading 8 zero partition checkpoints for rank 326 loading 8 zero partition checkpoints for rank 102 loading 8 zero partition checkpoints for rank 286 loading 8 zero partition checkpoints for rank 75 loading 8 zero partition checkpoints for rank 287 loading 8 zero partition checkpoints for rank 453 loading 8 zero partition checkpoints for rank 163 loading 8 zero partition checkpoints for rank 280 loading 8 zero partition checkpoints for rank 305 loading 8 zero partition checkpoints for rank 271 loading 8 zero partition checkpoints for rank 62 loading 8 zero partition checkpoints for rank 78 loading 8 zero partition checkpoints for rank 144 loading 8 zero partition checkpoints for rank 282 loading 8 zero partition checkpoints for rank 310 loading 8 zero partition checkpoints for rank 456 loading 8 zero partition checkpoints for rank 308 loading 8 zero partition checkpoints for rank 92 loading 8 zero partition checkpoints for rank 66 loading 8 zero partition checkpoints for rank 161 loading 8 zero partition checkpoints for rank 47 loading 8 zero partition checkpoints for rank 472 loading 8 zero partition checkpoints for rank 43 loading 8 zero partition checkpoints for rank 350 loading 8 zero partition checkpoints for rank 372 loading 8 zero partition checkpoints for rank 35 loading 8 zero partition checkpoints for rank 130 loading 8 zero partition checkpoints for rank 70 loading 8 zero partition checkpoints for rank 60 loading 8 zero partition checkpoints for rank 1 loading 8 zero partition checkpoints for rank 38 loading 8 zero partition checkpoints for rank 374 loading 8 zero partition checkpoints for rank 29 loading 8 zero partition checkpoints for rank 407 loading 8 zero partition checkpoints for rank 210 loading 8 zero partition checkpoints for rank 67 loading 8 zero partition checkpoints for rank 171 loading 8 zero partition checkpoints for rank 80 loading 8 zero partition checkpoints for rank 449 loading 8 zero partition checkpoints for rank 106 loading 8 zero partition checkpoints for rank 81 loading 8 zero partition checkpoints for rank 347 loading 8 zero partition checkpoints for rank 479 loading 8 zero partition checkpoints for rank 405 loading 8 zero partition checkpoints for rank 346 loading 8 zero partition checkpoints for rank 98 loading 8 zero partition checkpoints for rank 283 loading 8 zero partition checkpoints for rank 264 loading 8 zero partition checkpoints for rank 415 loading 8 zero partition checkpoints for rank 68 loading 8 zero partition checkpoints for rank 475 loading 8 zero partition checkpoints for rank 40 loading 8 zero partition checkpoints for rank 145 loading 8 zero partition checkpoints for rank 27 loading 8 zero partition checkpoints for rank 24 loading 8 zero partition checkpoints for rank 162 loading 8 zero partition checkpoints for rank 459 loading 8 zero partition checkpoints for rank 134 loading 8 zero partition checkpoints for rank 25 loading 8 zero partition checkpoints for rank 56 loading 8 zero partition checkpoints for rank 109 loading 8 zero partition checkpoints for rank 303 loading 8 zero partition checkpoints for rank 119 loading 8 zero partition checkpoints for rank 115 loading 8 zero partition checkpoints for rank 319 loading 8 zero partition checkpoints for rank 185 loading 8 zero partition checkpoints for rank 208 loading 8 zero partition checkpoints for rank 13 loading 8 zero partition checkpoints for rank 476 loading 8 zero partition checkpoints for rank 375 loading 8 zero partition checkpoints for rank 348 loading 8 zero partition checkpoints for rank 57 loading 8 zero partition checkpoints for rank 360 loading 8 zero partition checkpoints for rank 33 loading 8 zero partition checkpoints for rank 15 loading 8 zero partition checkpoints for rank 328 loading 8 zero partition checkpoints for rank 330 loading 8 zero partition checkpoints for rank 129 loading 8 zero partition checkpoints for rank 323 loading 8 zero partition checkpoints for rank 327 loading 8 zero partition checkpoints for rank 21 loading 8 zero partition checkpoints for rank 487 loading 8 zero partition checkpoints for rank 112 loading 8 zero partition checkpoints for rank 373 loading 8 zero partition checkpoints for rank 506 loading 8 zero partition checkpoints for rank 504 loading 8 zero partition checkpoints for rank 48 loading 8 zero partition checkpoints for rank 510 loading 8 zero partition checkpoints for rank 301 loading 8 zero partition checkpoints for rank 344 loading 8 zero partition checkpoints for rank 42 loading 8 zero partition checkpoints for rank 2 loading 8 zero partition checkpoints for rank 300 loading 8 zero partition checkpoints for rank 320 loading 8 zero partition checkpoints for rank 51 loading 8 zero partition checkpoints for rank 20 loading 8 zero partition checkpoints for rank 462 loading 8 zero partition checkpoints for rank 36 loading 8 zero partition checkpoints for rank 304 loading 8 zero partition checkpoints for rank 500 loading 8 zero partition checkpoints for rank 473 loading 8 zero partition checkpoints for rank 461 loading 8 zero partition checkpoints for rank 307 loading 8 zero partition checkpoints for rank 491 loading 8 zero partition checkpoints for rank 451 loading 8 zero partition checkpoints for rank 45 loading 8 zero partition checkpoints for rank 325 loading 8 zero partition checkpoints for rank 507 loading 8 zero partition checkpoints for rank 23 loading 8 zero partition checkpoints for rank 44 loading 8 zero partition checkpoints for rank 32 loading 8 zero partition checkpoints for rank 52 loading 8 zero partition checkpoints for rank 30 loading 8 zero partition checkpoints for rank 53 loading 8 zero partition checkpoints for rank 477 loading 8 zero partition checkpoints for rank 94 loading 8 zero partition checkpoints for rank 58 loading 8 zero partition checkpoints for rank 31 loading 8 zero partition checkpoints for rank 59 loading 8 zero partition checkpoints for rank 39 loading 8 zero partition checkpoints for rank 26 loading 8 zero partition checkpoints for rank 146 loading 8 zero partition checkpoints for rank 452 loading 8 zero partition checkpoints for rank 302 loading 8 zero partition checkpoints for rank 28 loading 8 zero partition checkpoints for rank 17 loading 8 zero partition checkpoints for rank 108 loading 8 zero partition checkpoints for rank 469 loading 8 zero partition checkpoints for rank 169 loading 8 zero partition checkpoints for rank 46 loading 8 zero partition checkpoints for rank 306 loading 8 zero partition checkpoints for rank 490 loading 8 zero partition checkpoints for rank 22 loading 8 zero partition checkpoints for rank 55 loading 8 zero partition checkpoints for rank 464 loading 8 zero partition checkpoints for rank 457 loading 8 zero partition checkpoints for rank 463 loading 8 zero partition checkpoints for rank 458 loading 8 zero partition checkpoints for rank 170 loading 8 zero partition checkpoints for rank 34 loading 8 zero partition checkpoints for rank 147 loading 8 zero partition checkpoints for rank 488 loading 8 zero partition checkpoints for rank 18 loading 8 zero partition checkpoints for rank 485 loading 8 zero partition checkpoints for rank 111 loading 8 zero partition checkpoints for rank 501 loading 8 zero partition checkpoints for rank 493 loading 8 zero partition checkpoints for rank 455 loading 8 zero partition checkpoints for rank 225 loading 8 zero partition checkpoints for rank 12 loading 8 zero partition checkpoints for rank 467 loading 8 zero partition checkpoints for rank 49 loading 8 zero partition checkpoints for rank 14 loading 8 zero partition checkpoints for rank 492 loading 8 zero partition checkpoints for rank 3 loading 8 zero partition checkpoints for rank 54 loading 8 zero partition checkpoints for rank 454 loading 8 zero partition checkpoints for rank 227 loading 8 zero partition checkpoints for rank 19 loading 8 zero partition checkpoints for rank 509 loading 8 zero partition checkpoints for rank 489 loading 8 zero partition checkpoints for rank 0 loading 8 zero partition checkpoints for rank 226 checkpoint version 3.0 loading 8 zero partition checkpoints for rank 466 loading 8 zero partition checkpoints for rank 499 loading 8 zero partition checkpoints for rank 484 loading 8 zero partition checkpoints for rank 224 loading 8 zero partition checkpoints for rank 135 loading 8 zero partition checkpoints for rank 50 loading 8 zero partition checkpoints for rank 110 loading 8 zero partition checkpoints for rank 505 loading 8 zero partition checkpoints for rank 497 loading 8 zero partition checkpoints for rank 496 loading 8 zero partition checkpoints for rank 498 loading 8 zero partition checkpoints for rank 494 loading 8 zero partition checkpoints for rank 133 loading 8 zero partition checkpoints for rank 486 loading 8 zero partition checkpoints for rank 16 loading 8 zero partition checkpoints for rank 495 loading 8 zero partition checkpoints for rank 503 loading 8 zero partition checkpoints for rank 465 loading 8 zero partition checkpoints for rank 502 loading 8 zero partition checkpoints for rank 508 loading 8 zero partition checkpoints for rank 511 loading 8 zero partition checkpoints for rank 480 successfully loaded 8 ZeRO state_dicts for rank 5 loading 8 zero partition checkpoints for rank 482 loading 8 zero partition checkpoints for rank 483 loading 8 zero partition checkpoints for rank 481 successfully loaded 8 ZeRO state_dicts for rank 6 successfully loaded 8 ZeRO state_dicts for rank 4 successfully loaded 8 ZeRO state_dicts for rank 7 loading 8 zero partition checkpoints for rank 5 loading 8 zero partition checkpoints for rank 4 loading 8 zero partition checkpoints for rank 6 loading 8 zero partition checkpoints for rank 7 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints at iteration 9768 time (ms) | load-checkpoint: 91243.56 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-27 03:56:36 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 300000000 validation: 1638400 test: 10240 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.143013 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.289 seconds total number of samples: 394611670 total number of epochs: 3 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.388 seconds total number of samples: 6927161 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_42s_shuffle_idx.npy loaded indexed file in 0.061 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-09-27 03:56:43 done with setup ... training ... time (ms) | model-and-optimizer-setup: 102057.80 | train/valid/test-data-iterators-setup: 5731.66 [before the start of training step] datetime: 2021-09-27 03:56:43 [2021-09-27 03:56:43,457] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information [2021-09-27 03:56:43,457] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-09-27 03:56:43,457] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 32 total layers [2021-09-27 03:56:43,457] [INFO] [checkpointing.py:415:forward] ----Synchronization False [2021-09-27 03:56:43,457] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False [Rank 192] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10290.1357421875 | reserved: 15132.0 | max reserved: 15132.0 [Rank 129] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10562.13623046875 | reserved: 15500.0 | max reserved: 15500.0 [Rank 130] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10562.13623046875 | reserved: 15364.0 | max reserved: 15364.0 [Rank 64] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10834.13671875 | reserved: 15820.0 | max reserved: 15820.0 [Rank 0] (after 9770 iterations) memory (MB) | allocated: 5267.49951171875 | max allocated: 12476.68310546875 | reserved: 18256.0 | max reserved: 18256.0 [Rank 2] (after 9770 iterations) memory (MB) | allocated: 5267.49951171875 | max allocated: 12476.68310546875 | reserved: 17788.0 | max reserved: 17788.0 [Rank 256] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10018.13525390625 | reserved: 14812.0 | max reserved: 14812.0 [Rank 257] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10018.13525390625 | reserved: 14940.0 | max reserved: 14940.0 [Rank 193] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10290.1357421875 | reserved: 15096.0 | max reserved: 15096.0[Rank 194] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10290.1357421875 | reserved: 15112.0 | max reserved: 15112.0 [Rank 128] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10562.13623046875 | reserved: 15456.0 | max reserved: 15456.0 [Rank 385] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 9474.13427734375 | reserved: 14312.0 | max reserved: 14312.0 [Rank 320] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 9746.134765625 | reserved: 14716.0 | max reserved: 14716.0 [Rank 65] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10834.13671875 | reserved: 15632.0 | max reserved: 15632.0 [Rank 1] (after 9770 iterations) memory (MB) | allocated: 5267.49951171875 | max allocated: 12476.68310546875 | reserved: 18256.0 | max reserved: 18256.0 [Rank 258] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10018.13525390625 | reserved: 14696.0 | max reserved: 14696.0 [Rank 131] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10562.13623046875 | reserved: 15532.0 | max reserved: 15532.0 [Rank 384] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 9474.13427734375 | reserved: 14268.0 | max reserved: 14268.0 [Rank 449] (after 9770 iterations) memory (MB) | allocated: 5685.35986328125 | max allocated: 10463.337890625 | reserved: 15736.0 | max reserved: 15736.0[Rank 448] (after 9770 iterations) memory (MB) | allocated: 5685.35986328125 | max allocated: 10463.33642578125 | reserved: 15736.0 | max reserved: 15736.0 [Rank 322] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 9746.134765625 | reserved: 14616.0 | max reserved: 14616.0 [Rank 66] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10834.13671875 | reserved: 15828.0 | max reserved: 15828.0 [Rank 3] (after 9770 iterations) memory (MB) | allocated: 5267.49951171875 | max allocated: 12476.68310546875 | reserved: 18256.0 | max reserved: 18256.0 [Rank 259] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10018.13525390625 | reserved: 14712.0 | max reserved: 14712.0 [Rank 195] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10290.1357421875 | reserved: 15208.0 | max reserved: 15208.0 [Rank 387] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 9474.13427734375 | reserved: 14312.0 | max reserved: 14312.0[Rank 386] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 9474.13427734375 | reserved: 14312.0 | max reserved: 14312.0 [Rank 451] (after 9770 iterations) memory (MB) | allocated: 5685.35986328125 | max allocated: 10463.3369140625 | reserved: 15736.0 | max reserved: 15736.0 [Rank 323] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 9746.134765625 | reserved: 14648.0 | max reserved: 14648.0 [Rank 67] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 10834.13671875 | reserved: 15536.0 | max reserved: 15536.0 [Rank 450] (after 9770 iterations) memory (MB) | allocated: 5685.35986328125 | max allocated: 10463.33544921875 | reserved: 15736.0 | max reserved: 15736.0 [Rank 321] (after 9770 iterations) memory (MB) | allocated: 4613.21923828125 | max allocated: 9746.134765625 | reserved: 14684.0 | max reserved: 14684.0 iteration 9770/ 159576 | consumed samples: 701760 | elapsed time per iteration (ms): 21146.4 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9780/ 159576 | consumed samples: 704160 | elapsed time per iteration (ms): 13340.2 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9790/ 159576 | consumed samples: 706560 | elapsed time per iteration (ms): 13419.1 | learning rate: 6.000E-05 | global batch size: 240 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9800/ 159576 | consumed samples: 708976 | elapsed time per iteration (ms): 13591.3 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9810/ 159576 | consumed samples: 711536 | elapsed time per iteration (ms): 13986.8 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9820/ 159576 | consumed samples: 714096 | elapsed time per iteration (ms): 14105.8 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9830/ 159576 | consumed samples: 716656 | elapsed time per iteration (ms): 14030.2 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9840/ 159576 | consumed samples: 719216 | elapsed time per iteration (ms): 14188.9 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 04:14:28] PULSE: tr8-104B is running for 20:12 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8]) iteration 9850/ 159576 | consumed samples: 721776 | elapsed time per iteration (ms): 14071.1 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9860/ 159576 | consumed samples: 724336 | elapsed time per iteration (ms): 14125.1 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9870/ 159576 | consumed samples: 726896 | elapsed time per iteration (ms): 14170.2 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9880/ 159576 | consumed samples: 729456 | elapsed time per iteration (ms): 14139.5 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9890/ 159576 | consumed samples: 732016 | elapsed time per iteration (ms): 14156.0 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9900/ 159576 | consumed samples: 734576 | elapsed time per iteration (ms): 14057.9 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9910/ 159576 | consumed samples: 737136 | elapsed time per iteration (ms): 14129.8 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9920/ 159576 | consumed samples: 739696 | elapsed time per iteration (ms): 14157.7 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9930/ 159576 | consumed samples: 742256 | elapsed time per iteration (ms): 14024.1 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9940/ 159576 | consumed samples: 744816 | elapsed time per iteration (ms): 13971.4 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9950/ 159576 | consumed samples: 747376 | elapsed time per iteration (ms): 14101.5 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9960/ 159576 | consumed samples: 749936 | elapsed time per iteration (ms): 14210.0 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9970/ 159576 | consumed samples: 752496 | elapsed time per iteration (ms): 14219.6 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9980/ 159576 | consumed samples: 755056 | elapsed time per iteration (ms): 14117.6 | learning rate: 6.000E-05 | global batch size: 256 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 9990/ 159576 | consumed samples: 757712 | elapsed time per iteration (ms): 14400.0 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 04:51:19,357] [INFO] [logging.py:68:log_dist] [Rank 0] step=10000, skipped=1052, lr=[5.999919375575235e-05, 5.999919375575235e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 10000 loss: nan iter time (s): 0.007 samples/sec: 37472.688 iteration 10000/ 159576 | consumed samples: 760432 | elapsed time per iteration (ms): 14648.0 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 10000 | lm loss value: 7.270623E+00 | lm loss PPL: 1.437445E+03 | ------------------------------------------------------------------------------------------------- iteration 10010/ 159576 | consumed samples: 763152 | elapsed time per iteration (ms): 16469.3 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10020/ 159576 | consumed samples: 765872 | elapsed time per iteration (ms): 14573.2 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10030/ 159576 | consumed samples: 768592 | elapsed time per iteration (ms): 14611.8 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10040/ 159576 | consumed samples: 771312 | elapsed time per iteration (ms): 14782.8 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10050/ 159576 | consumed samples: 774032 | elapsed time per iteration (ms): 14722.8 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10060/ 159576 | consumed samples: 776752 | elapsed time per iteration (ms): 14595.9 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10070/ 159576 | consumed samples: 779472 | elapsed time per iteration (ms): 14712.5 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10080/ 159576 | consumed samples: 782192 | elapsed time per iteration (ms): 14640.3 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10090/ 159576 | consumed samples: 784912 | elapsed time per iteration (ms): 15060.9 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 05:14:32] PULSE: tr8-104B is running for 1:20:16 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8]) iteration 10100/ 159576 | consumed samples: 787632 | elapsed time per iteration (ms): 14624.0 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10110/ 159576 | consumed samples: 790352 | elapsed time per iteration (ms): 14621.7 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10120/ 159576 | consumed samples: 793072 | elapsed time per iteration (ms): 14685.1 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10130/ 159576 | consumed samples: 795792 | elapsed time per iteration (ms): 14531.8 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10140/ 159576 | consumed samples: 798512 | elapsed time per iteration (ms): 14629.6 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10150/ 159576 | consumed samples: 801232 | elapsed time per iteration (ms): 14771.8 | learning rate: 6.000E-05 | global batch size: 272 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10160/ 159576 | consumed samples: 803984 | elapsed time per iteration (ms): 14889.9 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10170/ 159576 | consumed samples: 806864 | elapsed time per iteration (ms): 15471.9 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10180/ 159576 | consumed samples: 809744 | elapsed time per iteration (ms): 15228.6 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10190/ 159576 | consumed samples: 812624 | elapsed time per iteration (ms): 15425.1 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10200/ 159576 | consumed samples: 815504 | elapsed time per iteration (ms): 15390.8 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10210/ 159576 | consumed samples: 818384 | elapsed time per iteration (ms): 15293.9 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10220/ 159576 | consumed samples: 821264 | elapsed time per iteration (ms): 15259.9 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10230/ 159576 | consumed samples: 824144 | elapsed time per iteration (ms): 15547.4 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10240/ 159576 | consumed samples: 827024 | elapsed time per iteration (ms): 15375.5 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10250/ 159576 | consumed samples: 829904 | elapsed time per iteration (ms): 15322.8 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10260/ 159576 | consumed samples: 832784 | elapsed time per iteration (ms): 15280.3 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10270/ 159576 | consumed samples: 835664 | elapsed time per iteration (ms): 15390.4 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10280/ 159576 | consumed samples: 838544 | elapsed time per iteration (ms): 15339.6 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10290/ 159576 | consumed samples: 841424 | elapsed time per iteration (ms): 15252.5 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10300/ 159576 | consumed samples: 844304 | elapsed time per iteration (ms): 15146.5 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10310/ 159576 | consumed samples: 847184 | elapsed time per iteration (ms): 15389.7 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10320/ 159576 | consumed samples: 850064 | elapsed time per iteration (ms): 15348.5 | learning rate: 6.000E-05 | global batch size: 288 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10330/ 159576 | consumed samples: 853072 | elapsed time per iteration (ms): 15779.0 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 06:14:35] PULSE: tr8-104B is running for 2:20:19 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8]) iteration 10340/ 159576 | consumed samples: 856112 | elapsed time per iteration (ms): 15864.8 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10350/ 159576 | consumed samples: 859152 | elapsed time per iteration (ms): 15831.6 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10360/ 159576 | consumed samples: 862192 | elapsed time per iteration (ms): 15954.9 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10370/ 159576 | consumed samples: 865232 | elapsed time per iteration (ms): 15871.6 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10380/ 159576 | consumed samples: 868272 | elapsed time per iteration (ms): 15850.1 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10390/ 159576 | consumed samples: 871312 | elapsed time per iteration (ms): 15796.9 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10400/ 159576 | consumed samples: 874352 | elapsed time per iteration (ms): 16082.6 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10410/ 159576 | consumed samples: 877392 | elapsed time per iteration (ms): 16036.3 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10420/ 159576 | consumed samples: 880432 | elapsed time per iteration (ms): 15898.1 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10430/ 159576 | consumed samples: 883472 | elapsed time per iteration (ms): 15687.4 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10440/ 159576 | consumed samples: 886512 | elapsed time per iteration (ms): 15579.4 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10450/ 159576 | consumed samples: 889552 | elapsed time per iteration (ms): 16071.4 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10460/ 159576 | consumed samples: 892592 | elapsed time per iteration (ms): 15986.9 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10470/ 159576 | consumed samples: 895632 | elapsed time per iteration (ms): 15775.6 | learning rate: 6.000E-05 | global batch size: 304 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10480/ 159576 | consumed samples: 898720 | elapsed time per iteration (ms): 16164.1 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10490/ 159576 | consumed samples: 901920 | elapsed time per iteration (ms): 16520.7 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10500/ 159576 | consumed samples: 905120 | elapsed time per iteration (ms): 16597.6 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 10500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-27 06:59:42,258] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step10500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 10500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 21886.11 iteration 10510/ 159576 | consumed samples: 908320 | elapsed time per iteration (ms): 18676.6 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10520/ 159576 | consumed samples: 911520 | elapsed time per iteration (ms): 16429.2 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10530/ 159576 | consumed samples: 914720 | elapsed time per iteration (ms): 16551.8 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10540/ 159576 | consumed samples: 917920 | elapsed time per iteration (ms): 16488.6 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10550/ 159576 | consumed samples: 921120 | elapsed time per iteration (ms): 16385.6 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 07:14:45] PULSE: tr8-104B is running for 3:20:29 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8]) iteration 10560/ 159576 | consumed samples: 924320 | elapsed time per iteration (ms): 16352.3 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10570/ 159576 | consumed samples: 927520 | elapsed time per iteration (ms): 16281.1 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10580/ 159576 | consumed samples: 930720 | elapsed time per iteration (ms): 16433.2 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10590/ 159576 | consumed samples: 933920 | elapsed time per iteration (ms): 16276.4 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10600/ 159576 | consumed samples: 937120 | elapsed time per iteration (ms): 16510.6 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10610/ 159576 | consumed samples: 940320 | elapsed time per iteration (ms): 16415.6 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10620/ 159576 | consumed samples: 943520 | elapsed time per iteration (ms): 16211.4 | learning rate: 6.000E-05 | global batch size: 320 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10630/ 159576 | consumed samples: 946800 | elapsed time per iteration (ms): 16664.6 | learning rate: 6.000E-05 | global batch size: 336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10640/ 159576 | consumed samples: 950160 | elapsed time per iteration (ms): 17041.3 | learning rate: 6.000E-05 | global batch size: 336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10650/ 159576 | consumed samples: 953520 | elapsed time per iteration (ms): 17363.3 | learning rate: 6.000E-05 | global batch size: 336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10660/ 159576 | consumed samples: 956880 | elapsed time per iteration (ms): 16944.5 | learning rate: 6.000E-05 | global batch size: 336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10670/ 159576 | consumed samples: 960240 | elapsed time per iteration (ms): 17142.6 | learning rate: 6.000E-05 | global batch size: 336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10680/ 159576 | consumed samples: 963600 | elapsed time per iteration (ms): 17139.9 | learning rate: 6.000E-05 | global batch size: 336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10690/ 159576 | consumed samples: 966960 | elapsed time per iteration (ms): 17104.6 | learning rate: 6.000E-05 | global batch size: 336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10700/ 159576 | consumed samples: 970320 | elapsed time per iteration (ms): 16968.9 | learning rate: 6.000E-05 | global batch size: 336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10710/ 159576 | consumed samples: 973680 | elapsed time per iteration (ms): 17071.1 | learning rate: 6.000E-05 | global batch size: 336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10720/ 159576 | consumed samples: 977040 | elapsed time per iteration (ms): 16939.7 | learning rate: 6.000E-05 | global batch size: 336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10730/ 159576 | consumed samples: 980400 | elapsed time per iteration (ms): 17182.0 | learning rate: 6.000E-05 | global batch size: 336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10740/ 159576 | consumed samples: 983760 | elapsed time per iteration (ms): 16947.4 | learning rate: 6.000E-05 | global batch size: 336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10750/ 159576 | consumed samples: 987120 | elapsed time per iteration (ms): 16887.4 | learning rate: 6.000E-05 | global batch size: 336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10760/ 159576 | consumed samples: 990480 | elapsed time per iteration (ms): 17060.2 | learning rate: 6.000E-05 | global batch size: 336 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 08:14:50] PULSE: tr8-104B is running for 4:20:34 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8]) iteration 10770/ 159576 | consumed samples: 993920 | elapsed time per iteration (ms): 17207.0 | learning rate: 6.000E-05 | global batch size: 352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10780/ 159576 | consumed samples: 997440 | elapsed time per iteration (ms): 17439.0 | learning rate: 6.000E-05 | global batch size: 352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10790/ 159576 | consumed samples: 1000960 | elapsed time per iteration (ms): 17709.5 | learning rate: 6.000E-05 | global batch size: 352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10800/ 159576 | consumed samples: 1004480 | elapsed time per iteration (ms): 17397.4 | learning rate: 6.000E-05 | global batch size: 352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10810/ 159576 | consumed samples: 1008000 | elapsed time per iteration (ms): 17515.8 | learning rate: 6.000E-05 | global batch size: 352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10820/ 159576 | consumed samples: 1011520 | elapsed time per iteration (ms): 17500.0 | learning rate: 6.000E-05 | global batch size: 352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10830/ 159576 | consumed samples: 1015040 | elapsed time per iteration (ms): 17623.4 | learning rate: 6.000E-05 | global batch size: 352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10840/ 159576 | consumed samples: 1018560 | elapsed time per iteration (ms): 17764.6 | learning rate: 6.000E-05 | global batch size: 352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10850/ 159576 | consumed samples: 1022080 | elapsed time per iteration (ms): 17667.0 | learning rate: 6.000E-05 | global batch size: 352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10860/ 159576 | consumed samples: 1025600 | elapsed time per iteration (ms): 17590.6 | learning rate: 6.000E-05 | global batch size: 352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10870/ 159576 | consumed samples: 1029120 | elapsed time per iteration (ms): 17626.8 | learning rate: 6.000E-05 | global batch size: 352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10880/ 159576 | consumed samples: 1032640 | elapsed time per iteration (ms): 17668.3 | learning rate: 6.000E-05 | global batch size: 352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10890/ 159576 | consumed samples: 1036160 | elapsed time per iteration (ms): 17624.1 | learning rate: 6.000E-05 | global batch size: 352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10900/ 159576 | consumed samples: 1039680 | elapsed time per iteration (ms): 17793.8 | learning rate: 6.000E-05 | global batch size: 352 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10910/ 159576 | consumed samples: 1043360 | elapsed time per iteration (ms): 18188.2 | learning rate: 6.000E-05 | global batch size: 368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10920/ 159576 | consumed samples: 1047040 | elapsed time per iteration (ms): 18317.3 | learning rate: 6.000E-05 | global batch size: 368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10930/ 159576 | consumed samples: 1050720 | elapsed time per iteration (ms): 18324.8 | learning rate: 6.000E-05 | global batch size: 368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10940/ 159576 | consumed samples: 1054400 | elapsed time per iteration (ms): 18321.8 | learning rate: 6.000E-05 | global batch size: 368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10950/ 159576 | consumed samples: 1058080 | elapsed time per iteration (ms): 18321.0 | learning rate: 6.000E-05 | global batch size: 368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10960/ 159576 | consumed samples: 1061760 | elapsed time per iteration (ms): 18223.5 | learning rate: 6.000E-05 | global batch size: 368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 09:14:51] PULSE: tr8-104B is running for 5:20:35 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8]) iteration 10970/ 159576 | consumed samples: 1065440 | elapsed time per iteration (ms): 18268.5 | learning rate: 6.000E-05 | global batch size: 368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10980/ 159576 | consumed samples: 1069120 | elapsed time per iteration (ms): 18399.6 | learning rate: 6.000E-05 | global batch size: 368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 10990/ 159576 | consumed samples: 1072800 | elapsed time per iteration (ms): 18217.5 | learning rate: 6.000E-05 | global batch size: 368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11000/ 159576 | consumed samples: 1076480 | elapsed time per iteration (ms): 18260.1 | learning rate: 6.000E-05 | global batch size: 368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 11000 | lm loss value: 7.284734E+00 | lm loss PPL: 1.457873E+03 | ------------------------------------------------------------------------------------------------- iteration 11010/ 159576 | consumed samples: 1080160 | elapsed time per iteration (ms): 20666.6 | learning rate: 6.000E-05 | global batch size: 368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11020/ 159576 | consumed samples: 1083840 | elapsed time per iteration (ms): 18277.2 | learning rate: 6.000E-05 | global batch size: 368 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11030/ 159576 | consumed samples: 1087552 | elapsed time per iteration (ms): 18419.3 | learning rate: 6.000E-05 | global batch size: 384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11040/ 159576 | consumed samples: 1091392 | elapsed time per iteration (ms): 19002.0 | learning rate: 6.000E-05 | global batch size: 384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11050/ 159576 | consumed samples: 1095232 | elapsed time per iteration (ms): 18930.9 | learning rate: 6.000E-05 | global batch size: 384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11060/ 159576 | consumed samples: 1099072 | elapsed time per iteration (ms): 18821.2 | learning rate: 6.000E-05 | global batch size: 384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11070/ 159576 | consumed samples: 1102912 | elapsed time per iteration (ms): 18889.6 | learning rate: 6.000E-05 | global batch size: 384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11080/ 159576 | consumed samples: 1106752 | elapsed time per iteration (ms): 18970.4 | learning rate: 6.000E-05 | global batch size: 384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11090/ 159576 | consumed samples: 1110592 | elapsed time per iteration (ms): 18822.6 | learning rate: 6.000E-05 | global batch size: 384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11100/ 159576 | consumed samples: 1114432 | elapsed time per iteration (ms): 18697.2 | learning rate: 6.000E-05 | global batch size: 384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11110/ 159576 | consumed samples: 1118272 | elapsed time per iteration (ms): 18737.4 | learning rate: 6.000E-05 | global batch size: 384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11120/ 159576 | consumed samples: 1122112 | elapsed time per iteration (ms): 18949.1 | learning rate: 6.000E-05 | global batch size: 384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11130/ 159576 | consumed samples: 1125952 | elapsed time per iteration (ms): 19003.8 | learning rate: 6.000E-05 | global batch size: 384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11140/ 159576 | consumed samples: 1129792 | elapsed time per iteration (ms): 18836.8 | learning rate: 6.000E-05 | global batch size: 384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11150/ 159576 | consumed samples: 1133632 | elapsed time per iteration (ms): 18941.7 | learning rate: 6.000E-05 | global batch size: 384 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11160/ 159576 | consumed samples: 1137616 | elapsed time per iteration (ms): 19465.1 | learning rate: 6.000E-05 | global batch size: 400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 10:14:56] PULSE: tr8-104B is running for 6:20:40 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8]) iteration 11170/ 159576 | consumed samples: 1141616 | elapsed time per iteration (ms): 19493.8 | learning rate: 6.000E-05 | global batch size: 400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11180/ 159576 | consumed samples: 1145616 | elapsed time per iteration (ms): 19504.7 | learning rate: 6.000E-05 | global batch size: 400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11190/ 159576 | consumed samples: 1149616 | elapsed time per iteration (ms): 19555.2 | learning rate: 6.000E-05 | global batch size: 400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11200/ 159576 | consumed samples: 1153616 | elapsed time per iteration (ms): 19490.6 | learning rate: 6.000E-05 | global batch size: 400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11210/ 159576 | consumed samples: 1157616 | elapsed time per iteration (ms): 19532.7 | learning rate: 6.000E-05 | global batch size: 400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11220/ 159576 | consumed samples: 1161616 | elapsed time per iteration (ms): 19261.8 | learning rate: 6.000E-05 | global batch size: 400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11230/ 159576 | consumed samples: 1165616 | elapsed time per iteration (ms): 19376.4 | learning rate: 6.000E-05 | global batch size: 400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11240/ 159576 | consumed samples: 1169616 | elapsed time per iteration (ms): 19505.2 | learning rate: 6.000E-05 | global batch size: 400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11250/ 159576 | consumed samples: 1173616 | elapsed time per iteration (ms): 19535.4 | learning rate: 6.000E-05 | global batch size: 400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11260/ 159576 | consumed samples: 1177616 | elapsed time per iteration (ms): 19415.2 | learning rate: 6.000E-05 | global batch size: 400 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11270/ 159576 | consumed samples: 1181632 | elapsed time per iteration (ms): 19446.5 | learning rate: 6.000E-05 | global batch size: 416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11280/ 159576 | consumed samples: 1185792 | elapsed time per iteration (ms): 20068.3 | learning rate: 6.000E-05 | global batch size: 416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11290/ 159576 | consumed samples: 1189952 | elapsed time per iteration (ms): 19947.1 | learning rate: 6.000E-05 | global batch size: 416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11300/ 159576 | consumed samples: 1194112 | elapsed time per iteration (ms): 20002.0 | learning rate: 6.000E-05 | global batch size: 416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11310/ 159576 | consumed samples: 1198272 | elapsed time per iteration (ms): 20006.4 | learning rate: 6.000E-05 | global batch size: 416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11320/ 159576 | consumed samples: 1202432 | elapsed time per iteration (ms): 20000.1 | learning rate: 6.000E-05 | global batch size: 416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11330/ 159576 | consumed samples: 1206592 | elapsed time per iteration (ms): 20065.5 | learning rate: 6.000E-05 | global batch size: 416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11340/ 159576 | consumed samples: 1210752 | elapsed time per iteration (ms): 19952.9 | learning rate: 6.000E-05 | global batch size: 416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 11:15:05] PULSE: tr8-104B is running for 7:20:49 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8]) iteration 11350/ 159576 | consumed samples: 1214912 | elapsed time per iteration (ms): 19989.1 | learning rate: 6.000E-05 | global batch size: 416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11360/ 159576 | consumed samples: 1219072 | elapsed time per iteration (ms): 19868.7 | learning rate: 6.000E-05 | global batch size: 416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11370/ 159576 | consumed samples: 1223232 | elapsed time per iteration (ms): 19987.6 | learning rate: 6.000E-05 | global batch size: 416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11380/ 159576 | consumed samples: 1227392 | elapsed time per iteration (ms): 19947.5 | learning rate: 6.000E-05 | global batch size: 416 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11390/ 159576 | consumed samples: 1231664 | elapsed time per iteration (ms): 20206.1 | learning rate: 6.000E-05 | global batch size: 432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11400/ 159576 | consumed samples: 1235984 | elapsed time per iteration (ms): 20686.4 | learning rate: 6.000E-05 | global batch size: 432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11410/ 159576 | consumed samples: 1240304 | elapsed time per iteration (ms): 20763.5 | learning rate: 6.000E-05 | global batch size: 432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11420/ 159576 | consumed samples: 1244624 | elapsed time per iteration (ms): 20718.0 | learning rate: 6.000E-05 | global batch size: 432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11430/ 159576 | consumed samples: 1248944 | elapsed time per iteration (ms): 20629.3 | learning rate: 6.000E-05 | global batch size: 432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11440/ 159576 | consumed samples: 1253264 | elapsed time per iteration (ms): 20735.7 | learning rate: 6.000E-05 | global batch size: 432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11450/ 159576 | consumed samples: 1257584 | elapsed time per iteration (ms): 20551.6 | learning rate: 6.000E-05 | global batch size: 432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11460/ 159576 | consumed samples: 1261904 | elapsed time per iteration (ms): 20425.6 | learning rate: 6.000E-05 | global batch size: 432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11470/ 159576 | consumed samples: 1266224 | elapsed time per iteration (ms): 20522.3 | learning rate: 6.000E-05 | global batch size: 432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11480/ 159576 | consumed samples: 1270544 | elapsed time per iteration (ms): 20523.5 | learning rate: 6.000E-05 | global batch size: 432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11490/ 159576 | consumed samples: 1274864 | elapsed time per iteration (ms): 20644.7 | learning rate: 6.000E-05 | global batch size: 432 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11500/ 159576 | consumed samples: 1279312 | elapsed time per iteration (ms): 21082.2 | learning rate: 6.000E-05 | global batch size: 448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11510/ 159576 | consumed samples: 1283792 | elapsed time per iteration (ms): 21312.4 | learning rate: 6.000E-05 | global batch size: 448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11520/ 159576 | consumed samples: 1288272 | elapsed time per iteration (ms): 21403.7 | learning rate: 6.000E-05 | global batch size: 448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11530/ 159576 | consumed samples: 1292752 | elapsed time per iteration (ms): 21133.4 | learning rate: 6.000E-05 | global batch size: 448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11540/ 159576 | consumed samples: 1297232 | elapsed time per iteration (ms): 21166.4 | learning rate: 6.000E-05 | global batch size: 448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11550/ 159576 | consumed samples: 1301712 | elapsed time per iteration (ms): 21259.6 | learning rate: 6.000E-05 | global batch size: 448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 12:27:56] PULSE: tr8-104B is running for 8:33:40 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8]) iteration 11560/ 159576 | consumed samples: 1306192 | elapsed time per iteration (ms): 21050.1 | learning rate: 6.000E-05 | global batch size: 448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11570/ 159576 | consumed samples: 1310672 | elapsed time per iteration (ms): 21058.2 | learning rate: 6.000E-05 | global batch size: 448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11580/ 159576 | consumed samples: 1315152 | elapsed time per iteration (ms): 21057.7 | learning rate: 6.000E-05 | global batch size: 448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11590/ 159576 | consumed samples: 1319632 | elapsed time per iteration (ms): 21281.4 | learning rate: 6.000E-05 | global batch size: 448 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11600/ 159576 | consumed samples: 1324144 | elapsed time per iteration (ms): 21318.5 | learning rate: 6.000E-05 | global batch size: 464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11610/ 159576 | consumed samples: 1328784 | elapsed time per iteration (ms): 21769.2 | learning rate: 6.000E-05 | global batch size: 464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11620/ 159576 | consumed samples: 1333424 | elapsed time per iteration (ms): 21656.2 | learning rate: 6.000E-05 | global batch size: 464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11630/ 159576 | consumed samples: 1338064 | elapsed time per iteration (ms): 21947.9 | learning rate: 6.000E-05 | global batch size: 464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11640/ 159576 | consumed samples: 1342704 | elapsed time per iteration (ms): 21602.8 | learning rate: 6.000E-05 | global batch size: 464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11650/ 159576 | consumed samples: 1347344 | elapsed time per iteration (ms): 21770.3 | learning rate: 6.000E-05 | global batch size: 464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11660/ 159576 | consumed samples: 1351984 | elapsed time per iteration (ms): 21697.2 | learning rate: 6.000E-05 | global batch size: 464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11670/ 159576 | consumed samples: 1356624 | elapsed time per iteration (ms): 22004.7 | learning rate: 6.000E-05 | global batch size: 464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11680/ 159576 | consumed samples: 1361264 | elapsed time per iteration (ms): 21654.6 | learning rate: 6.000E-05 | global batch size: 464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11690/ 159576 | consumed samples: 1365904 | elapsed time per iteration (ms): 21840.4 | learning rate: 6.000E-05 | global batch size: 464 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11700/ 159576 | consumed samples: 1370560 | elapsed time per iteration (ms): 21982.9 | learning rate: 6.000E-05 | global batch size: 480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11710/ 159576 | consumed samples: 1375360 | elapsed time per iteration (ms): 22227.6 | learning rate: 6.000E-05 | global batch size: 480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11720/ 159576 | consumed samples: 1380160 | elapsed time per iteration (ms): 22533.1 | learning rate: 6.000E-05 | global batch size: 480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 13:27:56] PULSE: tr8-104B is running for 9:33:40 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8]) iteration 11730/ 159576 | consumed samples: 1384960 | elapsed time per iteration (ms): 22192.1 | learning rate: 6.000E-05 | global batch size: 480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11740/ 159576 | consumed samples: 1389760 | elapsed time per iteration (ms): 22268.7 | learning rate: 6.000E-05 | global batch size: 480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11750/ 159576 | consumed samples: 1394560 | elapsed time per iteration (ms): 22268.4 | learning rate: 6.000E-05 | global batch size: 480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11760/ 159576 | consumed samples: 1399360 | elapsed time per iteration (ms): 22141.9 | learning rate: 6.000E-05 | global batch size: 480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11770/ 159576 | consumed samples: 1404160 | elapsed time per iteration (ms): 21979.0 | learning rate: 6.000E-05 | global batch size: 480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11780/ 159576 | consumed samples: 1408960 | elapsed time per iteration (ms): 22172.2 | learning rate: 6.000E-05 | global batch size: 480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11790/ 159576 | consumed samples: 1413760 | elapsed time per iteration (ms): 22335.9 | learning rate: 6.000E-05 | global batch size: 480 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11800/ 159576 | consumed samples: 1418592 | elapsed time per iteration (ms): 22588.3 | learning rate: 6.000E-05 | global batch size: 496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11810/ 159576 | consumed samples: 1423552 | elapsed time per iteration (ms): 22823.4 | learning rate: 6.000E-05 | global batch size: 496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11820/ 159576 | consumed samples: 1428512 | elapsed time per iteration (ms): 22959.2 | learning rate: 6.000E-05 | global batch size: 496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11830/ 159576 | consumed samples: 1433472 | elapsed time per iteration (ms): 23080.3 | learning rate: 6.000E-05 | global batch size: 496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11840/ 159576 | consumed samples: 1438432 | elapsed time per iteration (ms): 23034.0 | learning rate: 6.000E-05 | global batch size: 496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11850/ 159576 | consumed samples: 1443392 | elapsed time per iteration (ms): 23099.6 | learning rate: 6.000E-05 | global batch size: 496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11860/ 159576 | consumed samples: 1448352 | elapsed time per iteration (ms): 23031.2 | learning rate: 6.000E-05 | global batch size: 496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11870/ 159576 | consumed samples: 1453312 | elapsed time per iteration (ms): 22866.8 | learning rate: 6.000E-05 | global batch size: 496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11880/ 159576 | consumed samples: 1458272 | elapsed time per iteration (ms): 23007.5 | learning rate: 6.000E-05 | global batch size: 496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 14:27:59] PULSE: tr8-104B is running for 10:33:43 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8]) iteration 11890/ 159576 | consumed samples: 1463232 | elapsed time per iteration (ms): 23034.3 | learning rate: 6.000E-05 | global batch size: 496 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11900/ 159576 | consumed samples: 1468304 | elapsed time per iteration (ms): 23486.5 | learning rate: 6.000E-05 | global batch size: 512 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11910/ 159576 | consumed samples: 1473424 | elapsed time per iteration (ms): 23540.7 | learning rate: 6.000E-05 | global batch size: 512 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11920/ 159576 | consumed samples: 1478544 | elapsed time per iteration (ms): 23676.0 | learning rate: 6.000E-05 | global batch size: 512 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11930/ 159576 | consumed samples: 1483664 | elapsed time per iteration (ms): 23529.7 | learning rate: 6.000E-05 | global batch size: 512 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11940/ 159576 | consumed samples: 1488784 | elapsed time per iteration (ms): 23604.1 | learning rate: 6.000E-05 | global batch size: 512 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11950/ 159576 | consumed samples: 1493904 | elapsed time per iteration (ms): 23627.0 | learning rate: 6.000E-05 | global batch size: 512 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11960/ 159576 | consumed samples: 1499024 | elapsed time per iteration (ms): 23559.5 | learning rate: 6.000E-05 | global batch size: 512 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11970/ 159576 | consumed samples: 1504144 | elapsed time per iteration (ms): 23611.0 | learning rate: 6.000E-05 | global batch size: 512 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11980/ 159576 | consumed samples: 1509264 | elapsed time per iteration (ms): 23634.8 | learning rate: 6.000E-05 | global batch size: 512 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 11990/ 159576 | consumed samples: 1514464 | elapsed time per iteration (ms): 23596.0 | learning rate: 6.000E-05 | global batch size: 528 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 15:14:45,510] [INFO] [logging.py:68:log_dist] [Rank 0] step=12000, skipped=3052, lr=[5.999919375575235e-05, 5.999919375575235e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 12000 loss: nan iter time (s): 0.012 samples/sec: 43274.454 iteration 12000/ 159576 | consumed samples: 1519744 | elapsed time per iteration (ms): 24091.4 | learning rate: 6.000E-05 | global batch size: 528 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------- validation loss at iteration 12000 | lm loss value: 7.282808E+00 | lm loss PPL: 1.455068E+03 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 12000 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-27 15:15:22,225] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step12000/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 12000 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 32585.61 iteration 12010/ 159576 | consumed samples: 1525024 | elapsed time per iteration (ms): 30246.8 | learning rate: 6.000E-05 | global batch size: 528 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12020/ 159576 | consumed samples: 1530304 | elapsed time per iteration (ms): 24139.3 | learning rate: 6.000E-05 | global batch size: 528 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12030/ 159576 | consumed samples: 1535584 | elapsed time per iteration (ms): 24280.0 | learning rate: 6.000E-05 | global batch size: 528 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 15:28:02] PULSE: tr8-104B is running for 11:33:46 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8]) iteration 12040/ 159576 | consumed samples: 1540864 | elapsed time per iteration (ms): 23963.9 | learning rate: 6.000E-05 | global batch size: 528 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12050/ 159576 | consumed samples: 1546144 | elapsed time per iteration (ms): 24135.8 | learning rate: 6.000E-05 | global batch size: 528 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12060/ 159576 | consumed samples: 1551424 | elapsed time per iteration (ms): 24044.3 | learning rate: 6.000E-05 | global batch size: 528 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12070/ 159576 | consumed samples: 1556704 | elapsed time per iteration (ms): 24087.4 | learning rate: 6.000E-05 | global batch size: 528 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12080/ 159576 | consumed samples: 1562064 | elapsed time per iteration (ms): 24400.0 | learning rate: 6.000E-05 | global batch size: 544 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12090/ 159576 | consumed samples: 1567504 | elapsed time per iteration (ms): 24552.7 | learning rate: 6.000E-05 | global batch size: 544 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12100/ 159576 | consumed samples: 1572944 | elapsed time per iteration (ms): 24886.7 | learning rate: 6.000E-05 | global batch size: 544 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12110/ 159576 | consumed samples: 1578384 | elapsed time per iteration (ms): 24781.4 | learning rate: 6.000E-05 | global batch size: 544 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12120/ 159576 | consumed samples: 1583824 | elapsed time per iteration (ms): 24493.1 | learning rate: 6.000E-05 | global batch size: 544 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12130/ 159576 | consumed samples: 1589264 | elapsed time per iteration (ms): 24851.3 | learning rate: 6.000E-05 | global batch size: 544 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12140/ 159576 | consumed samples: 1594704 | elapsed time per iteration (ms): 24746.4 | learning rate: 6.000E-05 | global batch size: 544 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12150/ 159576 | consumed samples: 1600144 | elapsed time per iteration (ms): 24578.3 | learning rate: 6.000E-05 | global batch size: 544 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12160/ 159576 | consumed samples: 1605584 | elapsed time per iteration (ms): 24469.2 | learning rate: 6.000E-05 | global batch size: 544 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12170/ 159576 | consumed samples: 1611152 | elapsed time per iteration (ms): 24994.1 | learning rate: 6.000E-05 | global batch size: 560 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 16:28:40] PULSE: tr8-104B is running for 12:34:24 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8]) iteration 12180/ 159576 | consumed samples: 1616752 | elapsed time per iteration (ms): 25275.1 | learning rate: 6.000E-05 | global batch size: 560 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12190/ 159576 | consumed samples: 1622352 | elapsed time per iteration (ms): 25176.8 | learning rate: 6.000E-05 | global batch size: 560 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12200/ 159576 | consumed samples: 1627952 | elapsed time per iteration (ms): 25167.8 | learning rate: 6.000E-05 | global batch size: 560 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12210/ 159576 | consumed samples: 1633552 | elapsed time per iteration (ms): 25057.7 | learning rate: 6.000E-05 | global batch size: 560 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12220/ 159576 | consumed samples: 1639152 | elapsed time per iteration (ms): 25147.4 | learning rate: 6.000E-05 | global batch size: 560 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12230/ 159576 | consumed samples: 1644752 | elapsed time per iteration (ms): 25198.7 | learning rate: 6.000E-05 | global batch size: 560 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12240/ 159576 | consumed samples: 1650352 | elapsed time per iteration (ms): 24894.2 | learning rate: 6.000E-05 | global batch size: 560 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12250/ 159576 | consumed samples: 1656016 | elapsed time per iteration (ms): 25306.4 | learning rate: 6.000E-05 | global batch size: 576 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12260/ 159576 | consumed samples: 1661776 | elapsed time per iteration (ms): 25946.7 | learning rate: 6.000E-05 | global batch size: 576 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12270/ 159576 | consumed samples: 1667536 | elapsed time per iteration (ms): 25714.3 | learning rate: 6.000E-05 | global batch size: 576 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12280/ 159576 | consumed samples: 1673296 | elapsed time per iteration (ms): 25863.6 | learning rate: 6.000E-05 | global batch size: 576 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12290/ 159576 | consumed samples: 1679056 | elapsed time per iteration (ms): 26038.1 | learning rate: 6.000E-05 | global batch size: 576 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12300/ 159576 | consumed samples: 1684816 | elapsed time per iteration (ms): 25611.4 | learning rate: 6.000E-05 | global batch size: 576 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12310/ 159576 | consumed samples: 1690576 | elapsed time per iteration (ms): 25819.3 | learning rate: 6.000E-05 | global batch size: 576 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 17:28:18] PULSE: tr8-104B is running for 13:34:02 since 2021-09-27T03:54:16 (1188168 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r6i7n[7-8],r7i0n[0-5],r7i1n[7-8],r7i2n[0-1,5,8],r7i3n2,r7i5n7,r7i6n[1-4,8],r7i7n[0-4,6-8],r8i0n[0-8],r8i1n[0-4],r8i2n8,r8i3n[0-3,8],r8i4n[0-1],r8i6n[2-3,5-6],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n[0,3-8],r9i3n[0-2,6-8],r9i4n[0-6,8],r9i5n[0-8],r9i6n[0-8],r9i7n[1-8]) iteration 12320/ 159576 | consumed samples: 1696336 | elapsed time per iteration (ms): 25983.5 | learning rate: 6.000E-05 | global batch size: 576 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12330/ 159576 | consumed samples: 1702128 | elapsed time per iteration (ms): 25674.0 | learning rate: 6.000E-05 | global batch size: 592 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 12340/ 159576 | consumed samples: 1708048 | elapsed time per iteration (ms): 26437.1 | learning rate: 6.000E-05 | global batch size: 592 | loss scale: 1.0 | grad norm: 0.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) Killing subprocess 76100 Killing subprocess 76101 Killing subprocess 76102 Killing subprocess 76103 Traceback (most recent call last): File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in main() File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/tr1-13B/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '8', '--num-layers', '32', '--hidden-size', '16384', '--ffn-hidden-size', '20480', '--num-attention-heads', '32', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--rampup-batch-size', '16', '16', '6_000_000', '--global-batch-size', '2048', '--train-samples', '300_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--seed', '42', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.999', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-decay-style', 'cosine', '--lr-decay-samples', '126_953_125', '--lr-warmup-samples', '216_320', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '10', '--save-interval', '1500', '--eval-interval', '1000', '--eval-iters', '5', '--codecarbon-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/codecarbon', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1188168.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' died with . srun: error: r6i5n7: task 0: Exited with exit code 1 srun: Terminating job step 1188168.0 Killing subprocess 59848 Killing subprocess 59849 Killing subprocess 59850 Killing subprocess 69437 Killing subprocess 59851 Killing subprocess 3750 Killing subprocess 69438 Killing subprocess 23911 Killing subprocess 36274 Killing subprocess 12887 Killing subprocess 64701 Killing subprocess 46448 Killing subprocess 37626 Killing subprocess 69439 Killing subprocess 12566 Killing subprocess 45975 Killing subprocess 59577 Killing subprocess 3751 Killing subprocess 69440 Killing subprocess 20638 Killing subprocess 12618 Killing subprocess 63737 Killing subprocess 12888 Killing subprocess 24910 Killing subprocess 77610 Killing subprocess 3752 Killing subprocess 65070 Killing subprocess 64702 Killing subprocess 46449 Killing subprocess 3710 Killing subprocess 36275 Killing subprocess 59578 Killing subprocess 64317 Killing subprocess 37627 Killing subprocess 23912 Killing subprocess 54693 Killing subprocess 76941 Killing subprocess 20639 Killing subprocess 74689 Killing subprocess 65692 Killing subprocess 12619 Killing subprocess 12567 Killing subprocess 63738 Killing subprocess 19395 Killing subprocess 44152 Killing subprocess 35247 Killing subprocess 14362 Killing subprocess 77611 Killing subprocess 59276 Killing subprocess 59579 Main process received SIGTERM, exiting Killing subprocess 37628 Killing subprocess 3753 Killing subprocess 65071 Main process received SIGTERM, exiting Killing subprocess 23913 Killing subprocess 54694 Killing subprocess 64703 Killing subprocess 12568 Killing subprocess 63739 Killing subprocess 46450 Killing subprocess 45976 Killing subprocess 3711 Killing subprocess 38195 Killing subprocess 36276 Killing subprocess 12889 Killing subprocess 24911 Killing subprocess 10979 Killing subprocess 77612 Killing subprocess 59580 Killing subprocess 18302 Killing subprocess 63373 Killing subprocess 64318 Killing subprocess 37630 Killing subprocess 65072 Killing subprocess 52483 Killing subprocess 23914 Killing subprocess 54695 Killing subprocess 68328 Killing subprocess 76942 Killing subprocess 20640 Killing subprocess 74690 Killing subprocess 65693 Killing subprocess 64705 Killing subprocess 12620 Killing subprocess 12569 Killing subprocess 63740 Killing subprocess 46451 Killing subprocess 45977 Killing subprocess 55848 Killing subprocess 3712 Killing subprocess 19396 Killing subprocess 44153 Killing subprocess 35248 Killing subprocess 47024 Killing subprocess 33695 Killing subprocess 36277 Killing subprocess 12891 Killing subprocess 63460 Killing subprocess 14363 Killing subprocess 57783 Killing subprocess 24912 Killing subprocess 10980 Killing subprocess 77613 Killing subprocess 59277 Killing subprocess 69993 Killing subprocess 53038 Killing subprocess 18303 Killing subprocess 63374 Killing subprocess 64319 Killing subprocess 8034 Killing subprocess 62238 Main process received SIGTERM, exiting Killing subprocess 53475 Killing subprocess 65073 Killing subprocess 52484 Killing subprocess 54696 Killing subprocess 68329 Killing subprocess 76943 Killing subprocess 20641 Killing subprocess 74691 Killing subprocess 65694 Killing subprocess 43049 Killing subprocess 12621 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 45978 Killing subprocess 55849 Killing subprocess 3713 Killing subprocess 39768 Killing subprocess 19397 Killing subprocess 44154 Killing subprocess 35249 Killing subprocess 47025 Killing subprocess 71483 Killing subprocess 33696 Killing subprocess 38196 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 63461 Killing subprocess 14364 Killing subprocess 57784 Killing subprocess 24913 Main process received SIGTERM, exiting Killing subprocess 59278 Killing subprocess 70408 Killing subprocess 69994 Killing subprocess 2853 Killing subprocess 53039 Killing subprocess 18304 Killing subprocess 52628 Killing subprocess 63375 Killing subprocess 64320 Killing subprocess 77051 Killing subprocess 41073 Killing subprocess 8035 Killing subprocess 3968 Killing subprocess 23148 Killing subprocess 67068 Main process received SIGTERM, exiting Killing subprocess 81189 Killing subprocess 62239 Killing subprocess 53476 Killing subprocess 69086 Killing subprocess 52485 Main process received SIGTERM, exiting Killing subprocess 62883 Killing subprocess 65551 Killing subprocess 68330 Killing subprocess 76945 Main process received SIGTERM, exiting Killing subprocess 75336 Killing subprocess 15286 Killing subprocess 74692 Killing subprocess 65695 Killing subprocess 43050 Main process received SIGTERM, exiting Killing subprocess 66988 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 55850 Killing subprocess 42101 Main process received SIGTERM, exiting Killing subprocess 8608 Killing subprocess 39769 Killing subprocess 19398 Killing subprocess 44155 Killing subprocess 15244 Killing subprocess 50869 Killing subprocess 35250 Killing subprocess 47026 Killing subprocess 71484 Killing subprocess 35789 Killing subprocess 56590 Killing subprocess 33697 Killing subprocess 38197 Killing subprocess 21496 Killing subprocess 63462 Killing subprocess 81499 Killing subprocess 14365 Killing subprocess 57785 Main process received SIGTERM, exiting Killing subprocess 10981 Killing subprocess 59279 Killing subprocess 37333 Main process received SIGTERM, exiting Killing subprocess 48823 Killing subprocess 70409 Killing subprocess 69995 Killing subprocess 2854 Killing subprocess 53040 Killing subprocess 18305 Killing subprocess 52629 Killing subprocess 63376 Main process received SIGTERM, exiting Killing subprocess 77052 Killing subprocess 41074 Killing subprocess 8036 Killing subprocess 39465 Killing subprocess 39466 Killing subprocess 39467 Killing subprocess 79012 Killing subprocess 3969 Killing subprocess 23149 Killing subprocess 67069 Killing subprocess 81190 Killing subprocess 56744 Killing subprocess 66319 Killing subprocess 62240 Killing subprocess 53477 Killing subprocess 25176 Killing subprocess 69087 Main process received SIGTERM, exiting Killing subprocess 52486 Killing subprocess 23707 Killing subprocess 62884 Main process received SIGTERM, exiting Killing subprocess 65552 Killing subprocess 68331 Killing subprocess 10802 Main process received SIGTERM, exiting Killing subprocess 37596 Killing subprocess 75337 Killing subprocess 15287 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 43051 Killing subprocess 12337 Killing subprocess 66989 Killing subprocess 50840 Killing subprocess 55851 Killing subprocess 42102 Killing subprocess 77529 Killing subprocess 13528 Killing subprocess 8609 Killing subprocess 14216 Killing subprocess 39770 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 15245 Killing subprocess 50870 Main process received SIGTERM, exiting Killing subprocess 47027 Killing subprocess 79944 Killing subprocess 71485 Killing subprocess 9027 Killing subprocess 35790 Killing subprocess 56591 Killing subprocess 33699 Killing subprocess 38198 Killing subprocess 37572 Killing subprocess 21497 Killing subprocess 63463 Killing subprocess 81500 Main process received SIGTERM, exiting Killing subprocess 57787 Killing subprocess 41379 Killing subprocess 10982 Main process received SIGTERM, exiting Killing subprocess 37334 Killing subprocess 48824 Killing subprocess 38560 Killing subprocess 41538 Killing subprocess 70410 Killing subprocess 69997 Killing subprocess 55623 Killing subprocess 2855 Killing subprocess 53042 Main process received SIGTERM, exiting Killing subprocess 52630 Main process received SIGTERM, exiting Killing subprocess 77053 Killing subprocess 41075 Killing subprocess 76949 Killing subprocess 8037 Killing subprocess 39468 Main process received SIGTERM, exiting Killing subprocess 79013 Killing subprocess 3970 Killing subprocess 23150 Killing subprocess 67070 Killing subprocess 2742 Killing subprocess 81191 Killing subprocess 47225 Killing subprocess 56745 Killing subprocess 66320 Killing subprocess 62241 Killing subprocess 54272 Killing subprocess 53478 Killing subprocess 25177 Killing subprocess 69088 Main process received SIGTERM, exiting Killing subprocess 23708 Killing subprocess 62885 Killing subprocess 79197 Killing subprocess 65553 Main process received SIGTERM, exiting Killing subprocess 10803 Killing subprocess 37597 Killing subprocess 75338 Killing subprocess 15288 Killing subprocess 43052 Killing subprocess 12338 Killing subprocess 14353 Killing subprocess 66990 Killing subprocess 50841 Killing subprocess 75513 Main process received SIGTERM, exiting Killing subprocess 42103 Killing subprocess 77530 Killing subprocess 13529 Killing subprocess 8610 Killing subprocess 14217 Killing subprocess 39772 Killing subprocess 15246 Killing subprocess 50871 Killing subprocess 52998 Killing subprocess 75590 Main process received SIGTERM, exiting Killing subprocess 79945 Killing subprocess 71487 Killing subprocess 9028 Killing subprocess 35791 Killing subprocess 56592 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 37573 Killing subprocess 21498 Main process received SIGTERM, exiting Killing subprocess 81501 Main process received SIGTERM, exiting Killing subprocess 41380 Main process received SIGTERM, exiting Killing subprocess 37335 Killing subprocess 48825 Killing subprocess 38561 Killing subprocess 41539 Killing subprocess 70411 Main process received SIGTERM, exiting Killing subprocess 55624 Killing subprocess 69208 Killing subprocess 2856 Main process received SIGTERM, exiting Killing subprocess 52631 Killing subprocess 35916 Killing subprocess 4836 Killing subprocess 77055 Killing subprocess 41076 Killing subprocess 76950 Main process received SIGTERM, exiting Killing subprocess 47505 Killing subprocess 79014 Killing subprocess 3971 Killing subprocess 23151 Killing subprocess 67071 Killing subprocess 34883 Killing subprocess 2743 Killing subprocess 81192 Killing subprocess 47226 Killing subprocess 56746 Killing subprocess 17937 Killing subprocess 66321 Main process received SIGTERM, exiting Killing subprocess 54273 Main process received SIGTERM, exiting Killing subprocess 25178 Killing subprocess 69089 Killing subprocess 23709 Killing subprocess 62886 Killing subprocess 79198 Killing subprocess 65554 Killing subprocess 67154 Killing subprocess 10804 Killing subprocess 37598 Killing subprocess 75339 Killing subprocess 15289 Main process received SIGTERM, exiting Killing subprocess 12339 Killing subprocess 14354 Killing subprocess 66992 Killing subprocess 50842 Killing subprocess 39827 Killing subprocess 75514 Killing subprocess 42105 Killing subprocess 77531 Killing subprocess 53851 Killing subprocess 13530 Killing subprocess 8611 Killing subprocess 14218 Main process received SIGTERM, exiting Killing subprocess 15247 Killing subprocess 50872 Killing subprocess 52999 Killing subprocess 75591 Killing subprocess 44143 Killing subprocess 79946 Main process received SIGTERM, exiting Killing subprocess 9029 Killing subprocess 35792 Killing subprocess 56593 Killing subprocess 37574 Killing subprocess 57528 Killing subprocess 21499 Killing subprocess 81502 Killing subprocess 41381 Killing subprocess 37336 Killing subprocess 48826 Killing subprocess 16969 Killing subprocess 38562 Killing subprocess 41540 Main process received SIGTERM, exiting Killing subprocess 55625 Killing subprocess 69209 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 35917 Killing subprocess 4837 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 76951 Killing subprocess 47506 Killing subprocess 79015 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 34884 Killing subprocess 2744 Main process received SIGTERM, exiting Killing subprocess 47227 Killing subprocess 56747 Killing subprocess 17938 Killing subprocess 66322 Killing subprocess 45571 Killing subprocess 54274 Killing subprocess 25179 Main process received SIGTERM, exiting Killing subprocess 23711 Main process received SIGTERM, exiting Killing subprocess 79199 Main process received SIGTERM, exiting Killing subprocess 67155 Killing subprocess 10805 Killing subprocess 37599 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 12340 Killing subprocess 14355 Main process received SIGTERM, exiting Killing subprocess 50844 Killing subprocess 39828 Killing subprocess 75515 Main process received SIGTERM, exiting Killing subprocess 7953 Killing subprocess 77532 Killing subprocess 53852 Killing subprocess 13531 Main process received SIGTERM, exiting Killing subprocess 14219 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 53000 Killing subprocess 75592 Killing subprocess 44144 Killing subprocess 79947 Killing subprocess 9030 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 37575 Killing subprocess 57529 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 41383 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 16970 Killing subprocess 38563 Killing subprocess 41541 Killing subprocess 55626 Killing subprocess 69210 Killing subprocess 35918 Killing subprocess 4838 Killing subprocess 76953 Killing subprocess 47507 Main process received SIGTERM, exiting Killing subprocess 34885 Killing subprocess 2745 Killing subprocess 47228 Main process received SIGTERM, exiting Killing subprocess 17939 Main process received SIGTERM, exiting Killing subprocess 45572 Killing subprocess 54275 Main process received SIGTERM, exiting Killing subprocess 34811 Main process received SIGTERM, exiting Killing subprocess 79200 Killing subprocess 67156 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 14357 Main process received SIGTERM, exiting Killing subprocess 39829 Killing subprocess 75516 Killing subprocess 7954 Main process received SIGTERM, exiting Killing subprocess 53853 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 53002 Killing subprocess 75593 Killing subprocess 44145 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 57530 Main process received SIGTERM, exiting Killing subprocess 16971 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 69211 Killing subprocess 35919 Killing subprocess 4839 Main process received SIGTERM, exiting Killing subprocess 47509 Killing subprocess 34886 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 17940 Killing subprocess 45573 Main process received SIGTERM, exiting Killing subprocess 34812 Main process received SIGTERM, exiting Killing subprocess 67157 Main process received SIGTERM, exiting Killing subprocess 39830 Main process received SIGTERM, exiting Killing subprocess 7955 Killing subprocess 53854 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 44147 Killing subprocess 57531 Killing subprocess 16972 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 45575 Killing subprocess 34813 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 34814 Main process received SIGTERM, exiting Killing subprocess 7956 Main process received SIGTERM, exiting Killing subprocess 42690 Killing subprocess 42691 Killing subprocess 42692 Killing subprocess 42693 Main process received SIGTERM, exiting Killing subprocess 7083 Killing subprocess 7084 Killing subprocess 7085 Killing subprocess 22811 Killing subprocess 7086 Killing subprocess 22812 Main process received SIGTERM, exiting Killing subprocess 22813 Killing subprocess 22814 Main process received SIGTERM, exiting Killing subprocess 13431 Killing subprocess 13432 Killing subprocess 13433 Killing subprocess 13434 Main process received SIGTERM, exiting Killing subprocess 72295 Killing subprocess 72296 Killing subprocess 72297 Killing subprocess 15401 Killing subprocess 72298 Killing subprocess 15402 Killing subprocess 15403 Killing subprocess 15405 Killing subprocess 52149 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 52150 Killing subprocess 52151 Killing subprocess 52152 Main process received SIGTERM, exiting Killing subprocess 38674 Killing subprocess 38675 Killing subprocess 33953 Killing subprocess 33954 Killing subprocess 38676 Killing subprocess 38677 Killing subprocess 33955 Main process received SIGTERM, exiting Killing subprocess 33957 Main process received SIGTERM, exiting Killing subprocess 65236 Killing subprocess 65237 Killing subprocess 65238 Killing subprocess 65239 Main process received SIGTERM, exiting srun: error: r8i1n2: task 43: Exited with exit code 1 srun: error: r9i5n7: task 109: Exited with exit code 1 srun: error: r9i6n0: task 111: Exited with exit code 1 srun: error: r9i0n1: task 65: Exited with exit code 1 srun: error: r9i0n3: task 67: Exited with exit code 1 srun: error: r8i0n2: task 34: Exited with exit code 1 srun: error: r7i6n2: task 20: Exited with exit code 1 srun: error: r9i2n8: task 87: Exited with exit code 1 srun: error: r8i0n8: task 40: Exited with exit code 1 srun: error: r9i3n1: task 89: Exited with exit code 1 srun: error: r9i4n1: task 95: Exited with exit code 1 srun: error: r9i3n0: task 88: Exited with exit code 1 srun: error: r6i6n0: task 2: Exited with exit code 1 srun: error: r8i3n2: task 49: Exited with exit code 1 srun: error: r8i0n7: task 39: Exited with exit code 1 srun: error: r9i6n7: task 118: Exited with exit code 1 srun: error: r8i7n6: task 61: Exited with exit code 1 srun: error: r8i7n4: task 59: Exited with exit code 1 srun: error: r9i5n4: task 106: Exited with exit code 1 srun: error: r8i0n6: task 38: Exited with exit code 1 srun: error: r9i5n8: task 110: Exited with exit code 1 srun: error: r9i0n4: task 68: Exited with exit code 1 srun: error: r9i4n3: task 97: Exited with exit code 1 srun: error: r8i1n0: task 41: Exited with exit code 1 srun: error: r7i7n7: task 30: Exited with exit code 1 srun: error: r9i2n3: task 82: Exited with exit code 1 srun: error: r9i6n8: task 119: Exited with exit code 1 srun: error: r8i7n7: task 62: Exited with exit code 1 srun: error: r9i5n5: task 107: Exited with exit code 1 srun: error: r9i2n6: task 85: Exited with exit code 1 srun: error: r7i6n4: task 22: Exited with exit code 1 srun: error: r9i1n2: task 74: Exited with exit code 1 srun: error: r9i0n0: task 64: Exited with exit code 1 srun: error: r9i0n5: task 69: Exited with exit code 1 srun: error: r8i2n8: task 46: Exited with exit code 1 srun: error: r9i4n2: task 96: Exited with exit code 1 srun: error: r7i3n2: task 17: Exited with exit code 1 srun: error: r9i3n7: task 92: Exited with exit code 1 srun: error: r9i0n2: task 66: Exited with exit code 1 srun: error: r9i1n3: task 75: Exited with exit code 1 srun: error: r8i1n4: task 45: Exited with exit code 1 srun: error: r8i7n5: task 60: Exited with exit code 1 srun: error: r9i2n5: task 84: Exited with exit code 1 srun: error: r7i7n8: task 31: Exited with exit code 1 srun: error: r8i0n5: task 37: Exited with exit code 1 srun: error: r8i7n3: task 58: Exited with exit code 1 srun: error: r7i6n3: task 21: Exited with exit code 1 srun: error: r9i1n1: task 73: Exited with exit code 1 srun: error: r9i3n8: task 93: Exited with exit code 1 srun: error: r8i7n8: task 63: Exited with exit code 1 srun: error: r8i3n0: task 47: Exited with exit code 1 srun: error: r8i0n3: task 35: Exited with exit code 1 srun: error: r9i4n0: task 94: Exited with exit code 1 srun: error: r9i5n3: task 105: Exited with exit code 1 srun: error: r8i1n3: task 44: Exited with exit code 1 srun: error: r8i6n6: task 57: Exited with exit code 1 srun: error: r8i0n0: task 32: Exited with exit code 1 srun: error: r9i5n6: task 108: Exited with exit code 1 srun: error: r9i2n4: task 83: Exited with exit code 1 srun: error: r8i3n1: task 48: Exited with exit code 1 srun: error: r7i2n5: task 15: Exited with exit code 1 srun: error: r9i1n0: task 72: Exited with exit code 1 srun: error: r7i5n7: task 18: Exited with exit code 1 srun: error: r6i5n8: task 1: Exited with exit code 1 srun: error: r8i3n8: task 51: Exited with exit code 1 srun: error: r8i0n4: task 36: Exited with exit code 1 srun: error: r8i0n1: task 33: Exited with exit code 1 srun: error: r7i7n2: task 26: Exited with exit code 1 srun: error: r8i3n3: task 50: Exited with exit code 1 srun: error: r7i7n6: task 29: Exited with exit code 1 srun: error: r7i6n1: task 19: Exited with exit code 1 srun: error: r7i6n8: task 23: Exited with exit code 1 srun: error: r9i2n0: task 81: Exited with exit code 1 srun: error: r9i4n6: task 100: Exited with exit code 1 srun: error: r8i6n2: task 54: Exited with exit code 1 srun: error: r9i3n2: task 90: Exited with exit code 1 srun: error: r8i6n3: task 55: Exited with exit code 1 srun: error: r7i7n0: task 24: Exited with exit code 1 srun: error: r8i4n0: task 52: Exited with exit code 1 srun: error: r9i1n8: task 80: Exited with exit code 1 srun: error: r8i4n1: task 53: Exited with exit code 1 srun: error: r8i1n1: task 42: Exited with exit code 1 srun: error: r9i5n2: task 104: Exited with exit code 1 srun: error: r9i0n8: task 71: Exited with exit code 1 srun: error: r9i5n1: task 103: Exited with exit code 1 srun: error: r7i7n1: task 25: Exited with exit code 1 srun: error: r9i4n4: task 98: Exited with exit code 1 srun: error: r7i7n4: task 28: Exited with exit code 1 srun: error: r9i0n6: task 70: Exited with exit code 1 srun: error: r9i1n7: task 79: Exited with exit code 1 srun: error: r9i2n7: task 86: Exited with exit code 1 srun: error: r9i1n6: task 78: Exited with exit code 1 srun: error: r9i5n0: task 102: Exited with exit code 1 srun: error: r9i3n6: task 91: Exited with exit code 1 srun: error: r9i1n5: task 77: Exited with exit code 1 srun: error: r7i2n8: task 16: Exited with exit code 1 srun: error: r9i4n8: task 101: Exited with exit code 1 srun: error: r9i4n5: task 99: Exited with exit code 1 srun: error: r7i2n1: task 14: Exited with exit code 1 srun: error: r7i0n0: task 5: Exited with exit code 1 srun: error: r9i6n6: task 117: Exited with exit code 1 srun: error: r9i7n6: task 125: Exited with exit code 1 srun: error: r9i7n4: task 123: Exited with exit code 1 srun: error: r6i7n8: task 4: Exited with exit code 1 srun: error: r9i6n2: task 113: Exited with exit code 1 srun: error: r9i6n3: task 114: Exited with exit code 1 srun: error: r6i7n7: task 3: Exited with exit code 1 srun: error: r7i0n5: task 10: Exited with exit code 1 srun: error: r9i7n5: task 124: Exited with exit code 1 srun: error: r7i1n8: task 12: Exited with exit code 1 srun: error: r9i7n7: task 126: Exited with exit code 1 srun: error: r7i0n2: task 7: Exited with exit code 1 srun: error: r7i0n3: task 8: Exited with exit code 1 srun: error: r9i7n8: task 127: Exited with exit code 1 srun: error: r7i2n0: task 13: Exited with exit code 1 srun: error: r7i1n7: task 11: Exited with exit code 1 srun: error: r9i6n1: task 112: Exited with exit code 1 srun: error: r7i0n4: task 9: Exited with exit code 1 srun: error: r9i1n4: task 76: Exited with exit code 1 srun: error: r9i7n2: task 121: Exited with exit code 1 srun: error: r9i6n4: task 115: Exited with exit code 1 srun: error: r7i0n1: task 6: Exited with exit code 1 srun: error: r9i7n3: task 122: Exited with exit code 1 srun: error: r8i6n5: task 56: Exited with exit code 1 srun: error: r9i6n5: task 116: Exited with exit code 1 srun: error: r7i7n3: task 27: Exited with exit code 1 srun: error: r9i7n1: task 120: Exited with exit code 1 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** Killing subprocess 32020 Killing subprocess 32021 Killing subprocess 32022 Killing subprocess 32023 Main process received SIGTERM, exiting Killing subprocess 2391 Killing subprocess 2392 Killing subprocess 2393 Killing subprocess 2395 Main process received SIGTERM, exiting Killing subprocess 8155 Killing subprocess 8156 Killing subprocess 8157 Killing subprocess 8158 Main process received SIGTERM, exiting Killing subprocess 1105 Killing subprocess 1106 Killing subprocess 1107 Killing subprocess 1108 Main process received SIGTERM, exiting Killing subprocess 61308 Killing subprocess 70292 Killing subprocess 42836 Killing subprocess 70293 Killing subprocess 70294 Killing subprocess 30001 Killing subprocess 61309 Killing subprocess 61310 Killing subprocess 61312 Killing subprocess 42837 Killing subprocess 57225 Killing subprocess 70296 Killing subprocess 30002 Killing subprocess 30003 Killing subprocess 13020 Main process received SIGTERM, exiting Killing subprocess 40485 Killing subprocess 72254 Killing subprocess 42838 Killing subprocess 42840 Main process received SIGTERM, exiting Killing subprocess 57226 Killing subprocess 57227 Main process received SIGTERM, exiting Killing subprocess 76054 Killing subprocess 30004 Main process received SIGTERM, exiting Killing subprocess 13021 Killing subprocess 40486 Killing subprocess 14664 Killing subprocess 72255 Killing subprocess 57228 Main process received SIGTERM, exiting Killing subprocess 16769 Killing subprocess 76055 Killing subprocess 76056 Killing subprocess 13022 Killing subprocess 13023 Main process received SIGTERM, exiting Killing subprocess 40487 Killing subprocess 40488 Main process received SIGTERM, exiting Killing subprocess 14665 Killing subprocess 14666 Killing subprocess 14668 Killing subprocess 72256 Killing subprocess 72258 Main process received SIGTERM, exiting Killing subprocess 16770 Killing subprocess 16771 Killing subprocess 76057 Main process received SIGTERM, exiting Killing subprocess 60803 Main process received SIGTERM, exiting Killing subprocess 66379 Killing subprocess 16772 Main process received SIGTERM, exiting Killing subprocess 60804 Killing subprocess 66380 Killing subprocess 66381 Killing subprocess 13204 Killing subprocess 60805 Killing subprocess 60806 Main process received SIGTERM, exiting Killing subprocess 66382 Main process received SIGTERM, exiting Killing subprocess 13205 Killing subprocess 13206 Killing subprocess 13207 Killing subprocess 33516 Killing subprocess 33006 Main process received SIGTERM, exiting Killing subprocess 33517 Killing subprocess 33518 Killing subprocess 33520 Killing subprocess 33007 Killing subprocess 33008 Killing subprocess 33009 Killing subprocess 72301 Killing subprocess 16814 Main process received SIGTERM, exiting Killing subprocess 59087 Killing subprocess 74735 Killing subprocess 13261 Main process received SIGTERM, exiting Killing subprocess 55620 Killing subprocess 72302 Killing subprocess 16815 Killing subprocess 59088 Killing subprocess 74736 Killing subprocess 74737 Killing subprocess 74738 Main process received SIGTERM, exiting Killing subprocess 13262 Killing subprocess 55621 Killing subprocess 55622 Killing subprocess 72303 Killing subprocess 72304 Main process received SIGTERM, exiting slurmstepd: error: *** STEP 1271130.0 ON r7i6n1 CANCELLED AT 2021-09-27T17:43:09 *** Killing subprocess 5069 Killing subprocess 16816 Killing subprocess 16817 Main process received SIGTERM, exiting Killing subprocess 59089 Killing subprocess 59090 Main process received SIGTERM, exiting Killing subprocess 36826 Killing subprocess 13263 Killing subprocess 13264 Main process received SIGTERM, exiting Killing subprocess 55623 Main process received SIGTERM, exiting Killing subprocess 72745 Killing subprocess 5070 Killing subprocess 5071 Killing subprocess 36827 Killing subprocess 36828 Killing subprocess 22929 Killing subprocess 5072 Main process received SIGTERM, exiting Killing subprocess 23020 Killing subprocess 39440 Killing subprocess 36829 Main process received SIGTERM, exiting Killing subprocess 72746 Killing subprocess 23021 Killing subprocess 39441 Killing subprocess 22930 Killing subprocess 22931 Killing subprocess 60544 Killing subprocess 72747 Killing subprocess 72748 Main process received SIGTERM, exiting Killing subprocess 23022 Killing subprocess 23023 Main process received SIGTERM, exiting Killing subprocess 39442 Killing subprocess 39443 Main process received SIGTERM, exiting Killing subprocess 4007 Killing subprocess 22932 Main process received SIGTERM, exiting Killing subprocess 60545 Killing subprocess 38454 Killing subprocess 31565 Killing subprocess 62249 Killing subprocess 4008 Killing subprocess 4009 Killing subprocess 60546 Killing subprocess 60547 Main process received SIGTERM, exiting Killing subprocess 38455 Killing subprocess 38456 Killing subprocess 65136 Killing subprocess 31566 Killing subprocess 31567 Killing subprocess 31568 Main process received SIGTERM, exiting Killing subprocess 14739 Killing subprocess 62250 Killing subprocess 62251 Killing subprocess 31604 Killing subprocess 4010 Main process received SIGTERM, exiting Killing subprocess 38457 Main process received SIGTERM, exiting Killing subprocess 65137 Killing subprocess 14740 Killing subprocess 14741 Killing subprocess 62252 Main process received SIGTERM, exiting Killing subprocess 31605 Killing subprocess 65138 Killing subprocess 65139 Main process received SIGTERM, exiting Killing subprocess 14743 Main process received SIGTERM, exiting Killing subprocess 31606 Killing subprocess 31607 Main process received SIGTERM, exiting Killing subprocess 3548 Killing subprocess 54160 Killing subprocess 3549 Killing subprocess 3550 Killing subprocess 54161 Killing subprocess 54162 Killing subprocess 54164 Main process received SIGTERM, exiting Killing subprocess 33462 Killing subprocess 37254 Killing subprocess 62641 Killing subprocess 3552 Main process received SIGTERM, exiting Killing subprocess 33463 Killing subprocess 33464 Killing subprocess 78252 Killing subprocess 37255 Killing subprocess 37256 Killing subprocess 62642 Killing subprocess 62643 Killing subprocess 62644 Killing subprocess 33465 Main process received SIGTERM, exiting Killing subprocess 78253 Killing subprocess 78254 Killing subprocess 78255 Killing subprocess 37257 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 71588 Killing subprocess 52835 Killing subprocess 66284 Main process received SIGTERM, exiting Killing subprocess 71589 Killing subprocess 52836 Killing subprocess 66285 Killing subprocess 71590 Killing subprocess 71591 Main process received SIGTERM, exiting Killing subprocess 73370 Killing subprocess 52837 Killing subprocess 52838 Main process received SIGTERM, exiting Killing subprocess 66286 Killing subprocess 66287 Main process received SIGTERM, exiting Killing subprocess 70128 Killing subprocess 73371 Killing subprocess 73372 Killing subprocess 76744 Killing subprocess 73373 Main process received SIGTERM, exiting Killing subprocess 70129 Killing subprocess 70130 Killing subprocess 70132 Main process received SIGTERM, exiting Killing subprocess 76745 Killing subprocess 76746 Killing subprocess 76748 Main process received SIGTERM, exiting Killing subprocess 42114 Killing subprocess 42115 Killing subprocess 42116 Killing subprocess 42117 Main process received SIGTERM, exiting Killing subprocess 22439 Killing subprocess 22440 Killing subprocess 22441 Killing subprocess 22442 Killing subprocess 6741 Main process received SIGTERM, exiting Killing subprocess 6742 Killing subprocess 6743 Killing subprocess 6744 Killing subprocess 27342 Main process received SIGTERM, exiting Killing subprocess 4903 Killing subprocess 27343 Killing subprocess 7749 Killing subprocess 4904 Killing subprocess 4905 Killing subprocess 4906 Main process received SIGTERM, exiting Killing subprocess 27344 Killing subprocess 27345 Main process received SIGTERM, exiting Killing subprocess 7750 Killing subprocess 7751 Killing subprocess 7753 Main process received SIGTERM, exiting Killing subprocess 78894 Killing subprocess 78895 Killing subprocess 78896 Killing subprocess 78897 Main process received SIGTERM, exiting Killing subprocess 24072 Killing subprocess 7177 Killing subprocess 24073 Killing subprocess 7178 Killing subprocess 24074 Killing subprocess 24075 Main process received SIGTERM, exiting Killing subprocess 7179 Killing subprocess 7180 Main process received SIGTERM, exiting Killing subprocess 78710 Killing subprocess 78711 Killing subprocess 78712 Killing subprocess 78713 Main process received SIGTERM, exiting Killing subprocess 66743 Killing subprocess 66744 Killing subprocess 66745 Killing subprocess 66751 Main process received SIGTERM, exiting Killing subprocess 65099 Killing subprocess 65100 Killing subprocess 65101 Killing subprocess 65103 Main process received SIGTERM, exiting srun: Job step aborted: Waiting up to 62 seconds for job step to finish. ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja ninja.................. [OKAY].................. [OKAY] -------------------------------------------------- -------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam [YES] ..................... [YES][OKAY] ...... [OKAY] fused_adam fused_adam............. .............[NO] .......[NO] [OKAY] ....... [OKAY] fused_lamb fused_lamb............. [NO]............. .......[NO] [OKAY] ....... [OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY] [OKAY] transformer ............transformer [NO]............ .......[NO] [OKAY] ....... [OKAY] stochastic_transformer stochastic_transformer. [NO] ........ [OKAY][NO] ....... [OKAY] ninjaninja .................. ..................[OKAY] [OKAY]-------------------------------------------------- --------------------------------------------------op name ................op name installed................ ..installed compatible.. compatible-------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] fused_adamfused_adam .......................... [NO][NO] .............. [OKAY][OKAY] fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] transformertransformer ........................ [NO][NO] .............. [OKAY] [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] [YES]...... ......[OKAY] [OKAY] fused_adam .............fused_adam [NO]............. .......[NO] [OKAY]....... [OKAY] fused_lamb fused_lamb............. .............[NO] [NO]....... .......[OKAY] [OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] transformer transformer............ ............[NO] [NO]....... .......[OKAY] [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................................................................ installedinstalledinstalled installed .. .... .. compatible compatiblecompatible compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adamcpu_adam.............................. ..............................[YES][YES] [YES][YES]............ [OKAY]............[OKAY] [OKAY][OKAY] fused_adam fused_adam.............fused_adamfused_adam .............[NO].......................... [NO].......[NO][NO] .......[OKAY].............. [OKAY][OKAY][OKAY] fused_lamb ............. fused_lambfused_lamb[NO] fused_lamb ............. ................................. [NO][OKAY][NO][NO] ..................... [OKAY][OKAY][OKAY] sparse_attn ............ [NO] .......sparse_attnsparse_attn [OKAY]sparse_attn........................ [NO][NO]............ transformer .............. [NO] ............ [OKAY] [OKAY]....... [NO] [OKAY].......transformer transformer [OKAY] ............ ............ [NO]transformer[NO] .......stochastic_transformer................... [OKAY][NO][OKAY] . .......[NO] [OKAY]stochastic_transformer .......stochastic_transformer [OKAY]. . stochastic_transformer [NO] [NO] ............... [OKAY][NO][OKAY] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ sparse_attninstalled .............. [NO]compatible .......-------------------------------------------------- [OKAY] transformer ............ [NO] cpu_adam....... ...............[OKAY] [YES] ...... stochastic_transformer[OKAY] . [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninjaJIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op nameop nameop name................ ................ ................ ................installedinstalled installedinstalled.... .. ..compatiblecompatible compatiblecompatible---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam [YES]...............cpu_adamcpu_adam ....................................[YES] [YES] [OKAY] ...... ...... [YES] [OKAY] fused_adam[OKAY] ................... [OKAY][NO] ....... [OKAY] fused_lamb fused_adam............. fused_adam .............fused_adam [NO] ............. [NO] .................... [NO] [OKAY]....... [NO].......[OKAY] [OKAY] ....... [OKAY]fused_lamb fused_lamb ............. .............fused_lambsparse_attn[NO] [NO]................................ [NO][NO].......[OKAY] [OKAY].............. [OKAY][OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn sparse_attn............stochastic_transformer sparse_attn [NO] ......................... .......[NO] [NO] [OKAY][NO] ....... ....... ....... [OKAY]transformer[OKAY] [OKAY]............ transformertransformer[NO] ............................... [NO][NO][OKAY] .............. [OKAY][OKAY]stochastic_transformer . stochastic_transformer[NO]stochastic_transformer ....... .[OKAY] . [NO] [NO]....... .......[OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................. ..................[OKAY][OKAY][OKAY] [OKAY]------------------------------------------------------------------------------------------------------------------------------------------------------ op name-------------------------------------------------- op name................op name ................op name installed................ installed .................. installed compatibleinstalled .. --------------------------------------------------....compatible compatible--------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES] ......cpu_adam cpu_adam[OKAY]...............cpu_adam ...............[YES]............... [YES] ...... ...... [YES] [OKAY]fused_adam [OKAY] ...... ............. [OKAY][NO] .......fused_adam [OKAY]............. [NO]fused_adam fused_lamb....... fused_adam............. .............[OKAY]............. [NO] [NO] [NO]..............fused_lamb .......[OKAY][OKAY] ............. [OKAY][NO] fused_lamb....... .............fused_lamb[OKAY] ............. [NO] sparse_attn [NO] ....... ............ .......[OKAY] [NO] [OKAY] .......sparse_attn [OKAY]............ [NO] .......transformer [OKAY]............ [NO]sparse_attnsparse_attn transformer....... ............ ........................ [OKAY] [NO] [NO][NO] .......stochastic_transformer.............. [OKAY] [OKAY] [OKAY] . [NO]transformerstochastic_transformer transformer....... ............[OKAY] .............[NO] [NO][NO]....... ..............[OKAY] [OKAY][OKAY] stochastic_transformer stochastic_transformer. [NO] ........ [NO][OKAY] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizerasync_io ............................. [NO][NO] .............. [OKAY][NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................. .................. ..................[OKAY][OKAY] [OKAY][OKAY]-------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name................op nameop name ................................installed................ installed..installedinstalled compatible .... .. --------------------------------------------------compatible compatible compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam...............cpu_adam [YES]cpu_adam.............................. [YES]......[YES] ............... ............[YES] [OKAY] [OKAY][OKAY]...... [OKAY] fused_adamfused_adam fused_adam .............fused_adam ............. ............. [NO]............. [NO][NO] ....... [NO] .............. [OKAY]....... [OKAY] [OKAY] [OKAY]fused_lamb fused_lambfused_lamb ............. fused_lamb ..........................[NO] ............. [NO] [NO]....... [NO] ....... ....... [OKAY].......[OKAY] [OKAY] [OKAY] sparse_attnsparse_attn sparse_attn ............sparse_attn ............[NO] ............ ................... [NO] [OKAY][NO] [NO] ....... ....... ....... [OKAY]transformer[OKAY] [OKAY]............ transformer[NO]transformer ...............................transformer [OKAY][NO]............ [NO] .......[NO] .......stochastic_transformer....... [OKAY] [OKAY] [OKAY]. [NO]stochastic_transformerstochastic_transformer stochastic_transformer....... . . [OKAY] [NO]. [NO] ....... [NO].......[OKAY] .......[OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop nameop name ................ ................................ ................installedinstalledinstalled ..installed.. .. ..compatiblecompatible compatiblecompatible-------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adamcpu_adam cpu_adam [YES].............................. ......[YES]...............[YES] ............[YES] [OKAY]......[OKAY] [OKAY] [OKAY] fused_adam fused_adam............. fused_adam ............. fused_adam .............[NO][NO] .......[NO] ....... [OKAY] ....................[OKAY]fused_lamb [OKAY] ............. [NO]fused_lamb fused_lamb....................[NO] [NO] .................... .......[OKAY] [NO][OKAY] [OKAY] ....... [OKAY] fused_lamb ............. [NO]sparse_attn .......sparse_attnsparse_attn............ ............ ............ [NO] [NO] [NO]....... [OKAY] ....... [OKAY]....... [OKAY][OKAY]transformer ............ transformertransformer[NO] ............ ................... [NO][NO][OKAY] .......sparse_attn....... stochastic_transformer[OKAY] ............[OKAY]. [NO]stochastic_transformer .......stochastic_transformer[NO]. [OKAY] [NO] . ..............[NO] [OKAY][OKAY]....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................. .................. [OKAY][OKAY] [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op nameop nameop name op name ................ ................................................ installed installed installedinstalled .. .. .. ..compatible compatible compatible compatible -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam [YES] ...............cpu_adam............... .....................[YES][YES] [YES] [OKAY] ...... ............ [OKAY][OKAY][OKAY] fused_adam ............. [NO] .......fused_adamfused_adam fused_adam[OKAY]............. ..........................[NO] [NO]....... fused_lamb[NO] ....... .............[OKAY] ....... [OKAY] [NO] [OKAY] fused_lamb....... fused_lamb .............fused_lamb [OKAY] .......................... [NO] [NO] [NO] ....... .............. [OKAY][OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] transformersparse_attn sparse_attn sparse_attn ........................ ............ ............[NO] [NO][NO] [NO].............. .......[OKAY] [OKAY]....... [OKAY] [OKAY]transformer stochastic_transformertransformer............ transformer ............ [NO] ............. [NO] .......[NO] [NO][OKAY]....... ....... ....... [OKAY] [OKAY][OKAY]stochastic_transformer stochastic_transformer.stochastic_transformer [NO].. .......[NO][NO] .......[OKAY]....... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at transformer_inference .. [NO] ....... [OKAY] runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................. .................. .................................... [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------op name op nameop name................ op name ................ installed................ ................ installed ..installed installed.. compatible....compatible compatible--------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam cpu_adam[YES] ............... ............... ..................... [YES] [YES] [OKAY] [YES] ...... ...... ...... [OKAY] [OKAY] [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name ninjaninjaninjaninja .................................... .................................... [OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- fused_adam fused_adam ............. .............fused_lamb[NO]............. .......[NO].............[NO] [OKAY] .......[NO] ....... .......[OKAY][OKAY]fused_lamb op name ................op nameop name................ installed ................installed.................. ..compatibleinstalledinstalled compatible [OKAY].............fused_lamb ..--------------------------------------------------..-------------------------------------------------- compatiblecompatible ---------------------------------------------------------------------------------------------------- op nameop nameop name op name................................ installedinstalled................ ................ .... installed installed compatiblecompatible.. fused_lamb [NO]............. ............. ....... [NO] [NO][OKAY] .............. [OKAY][OKAY]sparse_attn cpu_adam ...............cpu_adam [YES]............... cpu_adam ......cpu_adam [YES] ...............[OKAY] ............... ..-------------------------------------------------- -------------------------------------------------- compatiblecompatible ---------------------------------------------------------------------------------------------------- ............ [NO] ....... [OKAY] ......[YES] [YES]......[OKAY] ......[OKAY] cpu_adam cpu_adam............... ...............cpu_adamcpu_adam[YES] ...............[YES]..................... ...... [OKAY][YES] sparse_attn ............transformer sparse_attn [NO]sparse_attn ............ ...............................[NO] [OKAY] [NO].......[NO] fused_adam[OKAY] ............. [NO] ....... [OKAY] [YES][OKAY] .......transformer[OKAY] ....... [OKAY] ............ fused_adam fused_adam.............fused_lamb fused_adam [NO].......................... ....................[NO][NO] [NO][OKAY].............. .......[OKAY][OKAY] fused_lamb [OKAY] ............ [OKAY][OKAY] stochastic_transformer[OKAY][NO] .............fused_lamb [NO]fused_lamb............. ....... ............. sparse_attn[NO] [OKAY] [NO] ................... ....... [NO] [OKAY] [OKAY] ....... fused_adam ............. [NO]fused_adam .................... fused_adam fused_adam[OKAY] [NO] transformer .transformer....... ............ [NO]............ [OKAY] .......[NO] [NO] [OKAY]....... [OKAY] ............. ............. ....... [NO]fused_lamb [NO] [OKAY] ............. ....... ....... [NO] [OKAY] [OKAY] fused_lamb....... .......stochastic_transformer [OKAY][OKAY] . [NO] .......stochastic_transformer stochastic_transformer [OKAY] sparse_attn transformer............ ............[NO]sparse_attnsparse_attn [NO]................... ............ [OKAY] [NO].......[NO] .......[OKAY]transformer ....... .............[OKAY] fused_lamb fused_lamb [NO] ............. .................... [NO][NO][OKAY] .. [NO][NO] .............. [OKAY][OKAY] [OKAY] ............ [OKAY]stochastic_transformer[NO] .............. sparse_attn[OKAY][OKAY] transformer........ transformer[OKAY][NO]............ ............ [NO] ....... [OKAY] ...................[NO] stochastic_transformer.......[OKAY][NO] sparse_attn ............transformer [NO]sparse_attnsparse_attn............ ....... ........................[OKAY][NO] ....... [NO][NO] transformer[OKAY]....... ....... ........ [OKAY] [NO] [OKAY] ....... [OKAY] ninjaninjaninjaninja .................. .................. .................. ..................[OKAY] [OKAY] [OKAY] [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- ............[OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] --------------------------------------------------op name op nameop name op name ................................................ ................installedinstalledinstalled ..installed ....compatible ..compatiblecompatible-------------------------------------------------- stochastic_transformer[NO] .......transformer. transformer[OKAY] compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... cpu_adam...............[OKAY]cpu_adam ...............[YES]............... [YES]......[YES] ......[OKAY]...... [OKAY][OKAY]fused_adam ............[NO]............ .......[NO] stochastic_transformer[NO] [OKAY]....... . ....... [OKAY][NO] [OKAY]....... [OKAY] ............. [NO] ....... [OKAY] stochastic_transformer stochastic_transformer . .[NO] [NO]....... .......[OKAY] [OKAY] fused_adam fused_lamb.............fused_adam fused_adam[NO].......................... .............[NO].......[NO] [NO] .....................[OKAY] [OKAY][OKAY][OKAY] fused_lamb ............. [NO]fused_lamb fused_lamb ................................. [OKAY][NO][NO]sparse_attn .......................... [OKAY][OKAY][NO] ....... [OKAY] transformersparse_attn ........................ [NO][NO] .......sparse_attn.......sparse_attn [OKAY] ............[OKAY] ............ [NO]stochastic_transformer [NO]transformer....... ....................[OKAY] [OKAY][NO][NO] ..............transformer transformer [OKAY][OKAY] ............ ............ [NO][NO] stochastic_transformer .............. .[OKAY][OKAY] [NO] ....... stochastic_transformer[OKAY] stochastic_transformer. [NO]. .......[NO] [OKAY] ....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop nameop name op name ................ ................................ ................installedinstalled ..installed..installed compatible.. compatible..compatible-------------------------------------------------- --------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES]cpu_adamcpu_adam cpu_adam...... ............... ............... [OKAY]............... [YES] [YES][YES]...... ............ [OKAY] [OKAY]fused_adam [OKAY] ............. [NO] ....... [OKAY] fused_lambfused_adam fused_adam fused_adam............. ............. ..........................[NO] [NO] [NO]..............[NO] [OKAY][OKAY].............. [OKAY][OKAY] fused_lamb ............. [NO]fused_lamb .......fused_lamb [OKAY] ............. sparse_attn............. [NO]............[NO] .......[NO]....... .......[OKAY][OKAY] [OKAY]sparse_attn ............ transformer[NO] ................... [NO][OKAY] ....... transformer[OKAY] sparse_attnsparse_attn ............ ............ ............stochastic_transformer [NO] [NO][NO]....... . .............. [NO] [OKAY][OKAY][OKAY] ....... [OKAY] stochastic_transformertransformertransformer ......................... [NO][NO][NO] ..................... [OKAY][OKAY][OKAY] stochastic_transformer stochastic_transformer . [NO]. .......[NO] [OKAY]....... [OKAY] ninjaninjaninjaninja .................................... .................................... [OKAY] [OKAY] [OKAY][OKAY]-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name op name................ ................op name installed................ installed .................. installed ..compatible installed compatible..-------------------------------------------------- .. -------------------------------------------------- compatible compatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... cpu_adam[YES] ..................... cpu_adamcpu_adam[YES][OKAY] .................................... [YES][OKAY][YES] ............ [OKAY][OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY] ............. [NO] fused_adamfused_adam.......fused_lamb ..........................[OKAY]............. [NO][NO] [NO]fused_lamb ....... ........................... [OKAY] [OKAY][NO] [OKAY] .......fused_lamb [OKAY]fused_lamb ............. .............[NO] [NO]....... .......[OKAY] sparse_attn [OKAY] ............ [NO]sparse_attn ................... [OKAY][NO] ....... sparse_attntransformer[OKAY] sparse_attn............ ............ ............[NO]transformer [NO] [NO] ................... ....... ....... [OKAY][NO][OKAY] [OKAY] ....... [OKAY]stochastic_transformer transformer transformer ......................... stochastic_transformer [NO][NO] .[NO] ....... .......[NO] .......[OKAY] [OKAY] [OKAY] ....... stochastic_transformer[OKAY] stochastic_transformer . .[NO] [NO]....... .......[OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op nameop name................ op name ................ ................ installedinstalled ................ installed .... installed .. compatible compatible compatible .. ------------------------------------------------------------------------------------------------------------------------------------------------------compatible -------------------------------------------------- cpu_adamcpu_adamcpu_adam cpu_adam............... ............... [YES].............................. [YES] ......[YES][YES]...... [OKAY] ...... ......[OKAY] [OKAY][OKAY] fused_adam ............. [NO] fused_adam....... .............fused_adam fused_adam[OKAY]............. [NO].............[NO] fused_lamb....... [NO]....................[OKAY] .......[NO][OKAY] .......fused_lamb[OKAY] [OKAY]fused_lamb ............. .............[NO]fused_lamb .......[NO]............. [OKAY].......[NO] sparse_attn [OKAY]................... [NO][OKAY] ....... [OKAY] sparse_attn ............ transformer[NO] sparse_attn...................sparse_attn [OKAY][NO]........................ .......[NO]transformer [NO] .......[OKAY] ............ ....... [NO][OKAY] stochastic_transformer .......[OKAY] ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report .transformer[OKAY] transformer[NO] ------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ............................... [OKAY] [NO]stochastic_transformer [NO] ............... [OKAY][OKAY][NO] ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja ....... stochastic_transformer[OKAY] stochastic_transformer . . [NO][NO] .............. [OKAY][OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................ ................................ ................ installed installedinstalled installed .. .. .... compatiblecompatiblecompatible compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adamcpu_adam.............................. [YES]..............................[YES] ......[YES]......[YES] [OKAY]......[OKAY]...... [OKAY][OKAY] fused_adamfused_adam .......................... [NO]fused_adam[NO] fused_adam .................... .......[NO] ............. [OKAY] [NO][OKAY]....... .......[OKAY] [OKAY]fused_lambfused_lamb fused_lamb.......................... fused_lamb ............. [NO][NO][NO]............. ..............[NO]....... [OKAY] [OKAY][OKAY]....... [OKAY] sparse_attnsparse_attnsparse_attn sparse_attn.................................... ............[NO][NO] [NO] [NO]..................... ....... [OKAY] [OKAY][OKAY][OKAY]transformer ............ transformertransformertransformer[NO] ........................................... [NO][NO][NO][OKAY] ..................... [OKAY][OKAY][OKAY]stochastic_transformer .stochastic_transformerstochastic_transformer stochastic_transformer [NO]. .........[NO] [NO][NO] [OKAY]....... ....... ....... [OKAY] [OKAY] [OKAY] ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY] [OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name ................................ ................ ................ installedinstalled installed installed.... .. compatible compatiblecompatible -------------------------------------------------- ..---------------------------------------------------------------------------------------------------- compatible -------------------------------------------------- cpu_adamcpu_adamcpu_adam ............................................. [YES][YES] [YES]cpu_adam...... ...... ............... [OKAY]......[OKAY] [YES] [OKAY] ...... [OKAY] fused_adam fused_adam.............fused_adam .............[NO]............. [NO]fused_adam[NO] .................... ....... [OKAY] [NO]....... [OKAY] .......[OKAY]fused_lamb [OKAY]fused_lamb ............. fused_lamb ............. [NO] ............. [NO]fused_lamb[NO]....... ........................... [OKAY] [NO][OKAY] [OKAY] ....... [OKAY] sparse_attnsparse_attnsparse_attn ........................ sparse_attn............[NO] [NO] [NO]............ ....... ....... [NO] [OKAY]....... [OKAY] ....... [OKAY][OKAY]transformer transformer ........................transformer transformer[NO][NO]............ .......................... [NO] [NO] [OKAY] .......[OKAY] ....... [OKAY][OKAY] stochastic_transformer stochastic_transformer ..stochastic_transformer stochastic_transformer [NO] [NO] ......... ....... [NO][OKAY] [NO] [OKAY].............. [OKAY][OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name ................ ................................ ................installed installed..installedinstalled ..compatible.... compatiblecompatible-------------------------------------------------- compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES]cpu_adam cpu_adam............... ...... [YES]............... ............... ...... [OKAY][YES] [YES] [OKAY] ...... ...... [OKAY][OKAY] fused_adam ............. [NO] ....... fused_adamfused_adam[OKAY] fused_adam ............. .............fused_lamb............. [NO]............. [NO].......[NO][NO] ....... [OKAY] .............. [OKAY] [OKAY][OKAY] fused_lamb fused_lamb............. fused_lamb.............[NO] [NO] ........................... [NO] [OKAY] [OKAY] ....... [OKAY]sparse_attn ............ [NO] ....... [OKAY] transformer sparse_attnsparse_attn............ sparse_attn........................[NO] ...................[NO] [NO][OKAY] [NO]....... ..............stochastic_transformer [OKAY] [OKAY] [OKAY]. transformer[NO]transformertransformer ........................ ....... ............[NO][OKAY][NO] [NO] .............. .......[OKAY][OKAY] [OKAY] stochastic_transformerstochastic_transformer stochastic_transformer .. . [NO] [NO] [NO].............. [OKAY][OKAY]....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. utils .................. [YES] ...... [OKAY] DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name op name ................ op name................ ................ installedinstalled................ ..installed..installed compatible compatible.. .. ----------------------------------------------------------------------------------------------------compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam .............................. cpu_adamcpu_adam [YES] [YES]............... ...........................[YES] [OKAY][YES][OKAY] ...... ......[OKAY] [OKAY] fused_adam .............fused_adam [NO] .............fused_adamfused_adam....... ............. .............[OKAY][NO] [NO][NO]....... fused_lamb ....... ....... [OKAY]............. [OKAY] [OKAY] [NO] fused_lamb....... fused_lambfused_lamb ............. [OKAY]............. .............[NO] [NO][NO]....... ....... ....... [OKAY] [OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............sparse_attntransformersparse_attn [NO]........................ ............ ....... [NO] [NO][NO] [OKAY] .............. ....... [OKAY] transformer[OKAY][OKAY] ............transformer stochastic_transformer[NO]............ transformer [NO].................... [OKAY].......[NO][NO] [OKAY].............. stochastic_transformer [OKAY][OKAY]stochastic_transformer. [NO] .stochastic_transformer....... [NO][OKAY] . ....... [NO][OKAY] ....... [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- torch version .................... 1.8.1 NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- torch cuda version ............... 11.1 NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------op name op name op name................ op name................ ................ installed installed ................installed .. .. installed..compatible compatible..compatible-------------------------------------------------- --------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam...............cpu_adam cpu_adam...............[YES]............... ...............[YES]......[YES] [YES]......[OKAY] ...... [OKAY]...... [OKAY][OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. fused_adam fused_adam.............fused_adam fused_adam [NO] ............. .................................[NO] [NO][OKAY][NO]....... DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------JIT compiled ops requires ninja --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at ....... ....... fused_lamb [OKAY][OKAY] [OKAY] ............. runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja [NO] fused_lamb.......fused_lamb fused_lamb ............. .............[OKAY] ............. [NO] [NO] [NO] ....... ....... ....... [OKAY] [OKAY][OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] sparse_attn ............ [NO] ....... sparse_attn[OKAY]sparse_attn [OKAY]-------------------------------------------------- op name------------------------------------------------------------------------------------------------------------------------------------------------------ ................op name op nameop nameinstalled ................ .................. ................ installedcompatibleinstalled ..installed-------------------------------------------------- .. compatible .. sparse_attn ........................transformer............ ............[NO][NO][NO] [NO]....... ....... ....... .......[OKAY] [OKAY][OKAY][OKAY] compatible --------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- transformertransformer transformerstochastic_transformer............ .........................[NO] [NO][NO][NO] ....... ..................... [OKAY] [OKAY] [OKAY][OKAY] cpu_adam ............... [YES] ...... cpu_adamcpu_adam[OKAY]cpu_adam ............... ............... ............... [YES] [YES] [YES] ...... ...... ...... [OKAY] [OKAY] [OKAY] stochastic_transformerstochastic_transformerstochastic_transformer ... [NO][NO][NO] ..................... [OKAY][OKAY][OKAY] fused_adam ............. [NO] ....... [OKAY]fused_adam fused_adam fused_adam ............. .............fused_lamb ............. [NO] [NO] .............[NO] ....... ....... [NO] ....... [OKAY] [OKAY] ....... [OKAY] [OKAY] fused_lambfused_lamb ..........................fused_lamb [NO][NO]............. ..............[NO] [OKAY][OKAY]....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn transformersparse_attn ............ ........................sparse_attn [NO][NO][NO] ................... .......[OKAY] ....... [OKAY] [NO] [OKAY] ....... stochastic_transformer[OKAY]transformertransformer ......................... transformer [NO][NO][NO] ............ .............. [NO]....... [OKAY] [OKAY] .......[OKAY] [OKAY] stochastic_transformerstochastic_transformer stochastic_transformer .. . [NO] [NO] [NO] ....... ....... ....... [OKAY] [OKAY] [OKAY] ninjaninjaninjaninja .................................... ....................................[OKAY][OKAY] [OKAY] [OKAY]-------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------op name op name op name................................op name ................installed installed ................ ..installed .. compatibleinstalled.. compatible..--------------------------------------------------compatible -------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES] cpu_adam..................... cpu_adam ...............[OKAY] [YES] .....................[YES] [OKAY] [YES] ......fused_adam ......[OKAY]............. [OKAY][NO]fused_adam ....... .............[OKAY] fused_adam[NO] fused_lamb ................................. fused_adam [OKAY][NO].............[NO] [NO] ....... ....... .......fused_lamb [OKAY][OKAY] .............[OKAY] [NO]fused_lamb fused_lamb ....... ............. ............. [OKAY] [NO] [NO]sparse_attn .......................... [OKAY][OKAY] sparse_attn[NO] ................... [NO][OKAY] ....... [OKAY] transformer ............transformer sparse_attn sparse_attn[NO] ........................................... [NO][OKAY][NO][NO] ....... .......stochastic_transformer[OKAY] ....... [OKAY] [OKAY]. transformer [NO] transformer ................... stochastic_transformer [NO] ............. [OKAY] [NO]....... [NO] .......[OKAY]....... [OKAY] [OKAY] stochastic_transformer .stochastic_transformer [NO] ........ [NO][OKAY] ....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- --------------------------------------------------op nameop name-------------------------------------------------- ................................ op name op nameinstalledinstalled .................... ................ compatibleinstalled compatible -------------------------------------------------- installed ..-------------------------------------------------- ..compatible compatible-------------------------------------------------- --------------------------------------------------cpu_adam ...............cpu_adam [YES]............... ......[YES]cpu_adam ......[OKAY] ............... [OKAY]cpu_adam [YES]............... ......[YES] [OKAY]...... [OKAY]fused_adam fused_adam............. .............[NO] [NO]....... fused_adam .......[OKAY]fused_adam ..........................[OKAY] [NO][NO]fused_lamb ..............fused_lamb............. .............[OKAY][NO][OKAY] [NO]....... .......[OKAY]fused_lambfused_lamb [OKAY] ............. ............. [NO] ....... [OKAY] [NO] ....... [OKAY]sparse_attn sparse_attn............ ............[NO] [NO]sparse_attn....... ....... ............[OKAY][OKAY] sparse_attn [NO] ............transformer....... transformer [NO] [OKAY]............ ............ ....... [NO] [NO] transformer.......[OKAY] .......[OKAY]............ transformer[OKAY] [NO]............ .......stochastic_transformer[NO] [OKAY]stochastic_transformer ........ [NO][OKAY] .stochastic_transformer....... [NO][OKAY] ........stochastic_transformer [OKAY][NO] . .......[NO] [OKAY]....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaJIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op name op nameop name................ op name ................................installed installed................ installed.. installed .... compatible compatible.. compatible --------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam [YES]cpu_adam ............... .................................... [YES] [OKAY] [YES][YES]...... ...... ......[OKAY] [OKAY] [OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY] .............fused_adamfused_adam [NO]..........................fused_lamb ....................[NO][NO] [NO][OKAY].............. [OKAY][OKAY]....... [OKAY]fused_lamb .............fused_lambfused_lamb [NO] ............. ............. ....... [NO] [NO] [OKAY]sparse_attn ....... ....... ............[OKAY][OKAY] [NO] ....... [OKAY] sparse_attn ............transformer [NO]............ .......sparse_attn [NO] [OKAY]sparse_attn................... [NO][OKAY]............ transformer ....... [NO] stochastic_transformer [OKAY] ....... ............ . [OKAY] [NO] [NO]....... transformertransformer [OKAY]....... ........................ [OKAY] [NO]stochastic_transformer [NO]....... ........[OKAY] [NO][OKAY] ....... [OKAY]stochastic_transformer stochastic_transformer .. [NO] [NO]....... [OKAY]....... [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name ................ ................................................installed installedinstalledinstalled.. .. ....compatible compatiblecompatible-------------------------------------------------- compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam[YES] ............... ......cpu_adam ...............[YES] [OKAY] .....................[YES] [OKAY]...... [YES] [OKAY]......fused_adam [OKAY]............. [NO]fused_adam .................... [OKAY][NO] fused_adam ....... fused_lamb[OKAY]............. fused_adam.............[NO]fused_lamb [NO]................................. .......[NO] [NO][OKAY].......[OKAY] .......[OKAY] [OKAY]fused_lamb ............. fused_lamb[NO] .................... [NO][OKAY] sparse_attn....... sparse_attn ............ [OKAY]............ [NO] [NO]....... .......[OKAY] sparse_attn[OKAY] transformer............ ............transformer[NO] sparse_attn [NO]................... ............[NO] ....... [OKAY][NO][OKAY]....... .......[OKAY] transformer[OKAY] stochastic_transformer............ stochastic_transformer[NO]transformer. ....................[NO] [NO] [OKAY] .......[NO] ....... [OKAY].......[OKAY] stochastic_transformer [OKAY] . [NO] stochastic_transformer....... [OKAY] . [NO] ....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name op name op name op name................ ................ ................installed installed................ installed .. installed.. ..compatiblecompatible.. --------------------------------------------------compatible --------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] [YES]cpu_adam......cpu_adam .....................[OKAY]............... [OKAY] [YES] [YES]...... ......[OKAY] [OKAY] fused_adamfused_adam .......................... [NO][NO] fused_adam.............. [OKAY][OKAY]fused_adam............. fused_lamb.............[NO] fused_lamb[NO]............. ....................[NO]....... [NO] [OKAY].......[OKAY] .......[OKAY] fused_lamb[OKAY] fused_lamb............. .............[NO] [NO]....... .......[OKAY] [OKAY]sparse_attn sparse_attn............ ............[NO] [NO]....... .......[OKAY] [OKAY] sparse_attn transformersparse_attntransformer ............ ........................ ............[NO] [NO] [NO] [NO] ............................ [OKAY] [OKAY][OKAY] [OKAY]stochastic_transformertransformer stochastic_transformer ............. transformer. [NO] [NO] ............[NO] ....... .............. [NO][OKAY] [OKAY] [OKAY] ....... [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY] [OKAY] [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------op nameop name op name ................ ................op name ................ installedinstalledinstalled................ .. .. installedcompatiblecompatible.. .. compatible----------------------------------------------------------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... ...............[YES]cpu_adam [YES].....................cpu_adam ............... [OKAY]...... [YES] [YES] [OKAY] ...... ...... [OKAY] [OKAY] fused_adam ............. fused_adam[NO] .......fused_adam ............. [OKAY] .............fused_adam [NO] [NO].............fused_lamb....... .......[NO] ............. [OKAY][OKAY] ....... [NO] [OKAY].......fused_lambfused_lamb .............[OKAY]fused_lamb............. [NO] .............[NO] ..............[NO] [OKAY][OKAY]....... sparse_attn ............ [NO][OKAY] ....... [OKAY] transformer sparse_attn............sparse_attn [NO]........................ .......[NO][NO] sparse_attn [OKAY].............. ............[OKAY][OKAY] [NO] stochastic_transformertransformer .......transformer............. [OKAY] [NO] ............[NO] transformer.......[NO]....... [OKAY]....... [OKAY]............ [OKAY]stochastic_transformer[NO] ....... .[OKAY] stochastic_transformer [NO] ........ [OKAY][NO] stochastic_transformer....... [OKAY] . [NO] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja .................................... .................. .................. [OKAY][OKAY][OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name op nameop nameop name ................ ................ ................................ installed installed installedinstalled .. ....compatible.. DeepSpeed general environment info: compatible -------------------------------------------------- compatiblecompatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES]cpu_adam cpu_adamcpu_adam ...... ............... ..............................[OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] [YES][YES][YES] .................. [OKAY][OKAY][OKAY] fused_adam ............. [NO] ....... [OKAY] torch version .................... 1.8.1 fused_adamfused_adamfused_lamb .............fused_adam.......................... .............[NO][NO][NO] ....... [NO].............. [OKAY][OKAY].......[OKAY] torch cuda version ............... 11.1 [OKAY] nvcc version ..................... 11.2 fused_lamb fused_lamb............. .............[NO] fused_lamb [NO] ....... ............. ....... [OKAY] [NO]sparse_attn[OKAY] ................... [NO][OKAY] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] transformer sparse_attnsparse_attn............ ........................ [NO] [NO] sparse_attn[NO] ....... ..........................[OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 [OKAY][OKAY][NO] ....... stochastic_transformertransformer[OKAY]transformer ............ .............[NO]transformer ....... [NO]............ [NO] [OKAY]..............[NO] [OKAY].......[OKAY] stochastic_transformer[OKAY] stochastic_transformer. [NO]stochastic_transformer . ....... .[OKAY][NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................. ..................[OKAY] [OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op nameop name ................op name................op name installed................installed................ ....installedinstalled compatible .. compatible.. --------------------------------------------------compatible compatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ............... cpu_adamcpu_adamcpu_adam[YES] ................................................... [YES][YES][YES][OKAY] .................. [OKAY][OKAY][OKAY] fused_adam .............fused_adamfused_adamfused_adam .............[NO].......................... .......[NO][NO][NO] ....... [OKAY] ....... [OKAY]....... [OKAY] [OKAY] fused_lamb fused_lambfused_lamb............. fused_lamb ............. .............[NO] ............. [NO][NO].......[NO] .....................[OKAY] [OKAY][OKAY][OKAY] sparse_attnsparse_attnsparse_attnsparse_attn ................................................ [NO][NO][NO][NO] .............. .............. [OKAY][OKAY][OKAY][OKAY] transformertransformertransformertransformer ................................................ [NO][NO][NO][NO] ....... .............. .......[OKAY][OKAY] [OKAY] [OKAY] stochastic_transformerstochastic_transformerstochastic_transformer stochastic_transformer .... [NO][NO][NO][NO] ............................ [OKAY][OKAY] [OKAY] [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninja ninja ...................................................... [OKAY] .................. [OKAY][OKAY] --------------------------------------------------[OKAY] ---------------------------------------------------------------------------------------------------- op name--------------------------------------------------op name op name ................................ op name ................installed................ installed installed ..installed.. compatible.... compatible -------------------------------------------------- compatiblecompatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam ...............cpu_adam............... cpu_adam............... [YES][YES] [YES] ..................... [YES]......[OKAY] ...... ......[OKAY] [OKAY][OKAY] fused_adam ............. fused_adam[NO] fused_adam.............fused_adam....... .............[NO].............[OKAY] .......[NO][NO] [OKAY].............. fused_lamb [OKAY] fused_lamb[OKAY]............. .............fused_lamb [NO] fused_lamb[NO] ............. ........................... [NO][OKAY][OKAY] [NO] .............. [OKAY][OKAY] sparse_attnsparse_attn ........................ [NO]sparse_attn [NO]....... sparse_attn ....... [OKAY] ........................ [OKAY] [NO] [NO] transformer ....... .......[OKAY]transformer............  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [OKAY][NO] ............transformer....... transformer [NO]............ [OKAY] ...................  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`................ [NO] [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ....... [NO] [NO] [NO][OKAY]stochastic_transformer .......  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY]async_io async_io............... [NO]............... .......[NO] utils[NO]....... ..................[NO] ....... .[OKAY] stochastic_transformer [OKAY] [NO] async_io ............... [NO] ....... [NO] [YES] ...... [OKAY] quantizer transformer_inference.............. ..[NO] transformer_inference[NO]....... .........[OKAY] ........stochastic_transformer stochastic_transformer [OKAY][NO] ......... [NO][NO][OKAY] async_io ............... [NO] ....... [NO] [NO][OKAY] ....... [OKAY]-------------------------------------------------- .............. [OKAY][OKAY] transformer_inference .. [NO] ....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... transformer_inference .. [NO]utils ......................... [OKAY][YES] [OKAY] ...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- utilsquantizer ................................ [YES][NO] ............. [OKAY][OKAY] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] -------------------------------------------------- ninjaninja .................................... [OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- op name op name................ ................installed installed.. ..compatible compatible-------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... [YES]............... ...... [OKAY][YES] ...... [OKAY] fused_adam ............. [NO] fused_adam....... [OKAY]............. [NO] fused_lamb....... ............. [OKAY][NO] ....... [OKAY]fused_lamb  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] sparse_attn transformer............ ............ [NO][NO] .............. [OKAY][OKAY] transformer_inference .. [NO] ....... [OKAY] transformerstochastic_transformer ............ .[NO] [NO] .............. [OKAY][OKAY] utils .................. [YES] ...... [OKAY] stochastic_transformer . [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaJIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninjaJIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io ............... [NO] ....... [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY][OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] torch version .................... 1.8.1 ------------------------------------------------------------------------------------------------------------------------------------------------------ op name utils .................. [YES] ...... [OKAY] torch cuda version ............... 11.1 op nameop nameop name ................ ................................................ installedinstalled installed..installed.. ..compatiblecompatible.. compatible quantizer .............. [NO] ....... [OKAY] nvcc version ..................... 11.2 ---------------------------------------------------------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science cpu_adam cpu_adam............... cpu_adam cpu_adam...............[YES] .....................[YES] [OKAY]...............[YES]...... [YES] ......[OKAY]...... [OKAY][OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 fused_adam ............. [NO] ....... fused_adam[OKAY] fused_adam .............fused_adam [NO].......................... fused_lamb .......[NO] ............. [NO][OKAY] .......[NO]....... fused_lamb.......[OKAY][OKAY] ............. [OKAY] [NO]fused_lambfused_lamb .................... ............. [OKAY] [NO] [NO] .............. [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. sparse_attn ............ transformer[NO]sparse_attn sparse_attn ........................................... [NO] [NO][OKAY] [NO] async_io ............... [NO] ....... [NO] ....... ....... .......transformer [OKAY] ............[OKAY] [OKAY] [NO] transformer_inference .. [NO] ....... [OKAY] stochastic_transformer....... transformer transformer [OKAY] ......................... [NO] [NO].......[NO] stochastic_transformer[OKAY] ....... utils .................. [YES] ...... [OKAY] ....... [OKAY][OKAY] . [NO] .......stochastic_transformer [OKAY]stochastic_transformer . [NO]. .......[NO] [OKAY]....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ...................................................... [OKAY][OKAY]..................[OKAY] --------------------------------------------------[OKAY]-------------------------------------------------- -------------------------------------------------- op name -------------------------------------------------- op nameop name................ ................................installedop name installedinstalled .. .................. .. compatibleinstalledcompatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------.. compatible -------------------------------------------------- cpu_adam cpu_adamcpu_adam............... ..............................cpu_adam[YES] [YES][YES]............... ...... ...... [YES] ......[OKAY] [OKAY] ......[OKAY] [OKAY] fused_adam .............fused_adam fused_adam[NO] ............. fused_adam............. ....... [NO] [NO] [OKAY].................... .......[NO][OKAY] fused_lamb [OKAY] ....... ............. fused_lamb [OKAY] fused_lamb [NO]............. fused_lamb....................[NO] .............[OKAY].......[NO] [OKAY].......[NO] .......[OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY]sparse_attn ............ transformersparse_attn[NO] sparse_attn ............ ............................... [NO][OKAY] [NO] .......[NO] .......[OKAY] ....... transformer[OKAY] [OKAY]............ stochastic_transformer transformertransformer[NO] . ................... ............ [NO] [NO] [OKAY] [NO].............. .......[OKAY][OKAY] stochastic_transformer  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [OKAY] async_io ............... [NO] ....... [NO] .stochastic_transformer [NO].stochastic_transformer .......[NO] ........[OKAY] [NO][OKAY] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninja ninja .................................... .................. .................. [OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop name op name op name................ ................ ................ ................installed installedinstalledinstalled.. ....compatible.. compatiblecompatiblecompatible -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adamcpu_adamcpu_adam ............... .............................. ...............[YES] [YES] [YES] ............[YES]...... [OKAY] [OKAY] ...... [OKAY][OKAY] fused_adamfused_adam ............. fused_adam............. fused_adam .............[NO].............[NO] [NO].......[NO] ....... .......[OKAY]....... [OKAY][OKAY][OKAY] fused_lamb fused_lamb.............fused_lamb fused_lamb............. ............. [NO]............. [NO] [NO] ....... [NO] .............. [OKAY] ....... [OKAY][OKAY] [OKAY] sparse_attnsparse_attnsparse_attn ............sparse_attn ............[NO]............ ............ [NO] [NO]....... [NO] .............. [OKAY] ....... [OKAY][OKAY] [OKAY]transformer ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name transformer............transformer transformer ........................ [NO] ............ [NO][NO] ....... ....... [OKAY][NO] ....... [OKAY] [OKAY]....... stochastic_transformer[OKAY] stochastic_transformer op name op name................ op name ................ ................installed ................ installedinstalled.. installed ..compatible.. .. compatible-------------------------------------------------- compatible --------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- stochastic_transformer . .stochastic_transformer[NO]. [NO]....... [NO]. ....... [NO][OKAY] ....... [OKAY]....... cpu_adamcpu_adam .............................. cpu_adam[YES]cpu_adam ............... ......[YES] ............... [YES]...... [OKAY][OKAY] [YES] ...... [OKAY] [OKAY] ......[OKAY] [OKAY] fused_adam ............. fused_adam[NO] ....................fused_adamfused_adam [OKAY] [NO]............. ............. .......[NO][NO]fused_lamb [OKAY] ............. ....... ....... fused_lamb[NO] [OKAY][OKAY]............. ....... [NO]fused_lamb[OKAY] .......fused_lamb ..........................[OKAY] [NO][NO] ....... .......[OKAY] [OKAY]sparse_attn ............ [NO] ....... sparse_attn[OKAY] ............ [NO]transformer .......sparse_attn............ sparse_attn [OKAY] [NO]........................transformer .......[NO]............[NO] [NO][OKAY]....... ....... ....... [OKAY] [OKAY] stochastic_transformer [OKAY] transformer. transformerstochastic_transformer ............ [NO][NO]............. ..............[NO][NO] [OKAY][OKAY]....... ....... [OKAY] [OKAY]stochastic_transformer . [NO]stochastic_transformer ....... [OKAY]. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ...................................................... [OKAY].................. [OKAY][OKAY] --------------------------------------------------[OKAY] ---------------------------------------------------------------------------------------------------- op name --------------------------------------------------op name op name ................ ................op name ................ installed installed................installed ......installed compatible compatible compatible.. -------------------------------------------------- ----------------------------------------------------------------------------------------------------compatible -------------------------------------------------- cpu_adamcpu_adamcpu_adam ...............cpu_adam............... ...............[YES] [YES] ............... [YES]............ [YES]......[OKAY][OKAY] ......[OKAY] [OKAY] fused_adamfused_adam fused_adam ............. fused_adam............. ............. [NO] [NO]............. [NO] ..............[NO] .......[OKAY][OKAY] ....... [OKAY]fused_lamb[OKAY] fused_lamb............. .............fused_lamb[NO] fused_lamb [NO] ....... ............. .............[OKAY]....... [NO][OKAY][NO] .............. [OKAY][OKAY] DeepSpeed general environment info: sparse_attn ............sparse_attn [NO]sparse_attn ............sparse_attn....... [NO] ............ [OKAY]............[NO]....... [NO][OKAY]....... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer....... [OKAY] ............transformer torch version .................... 1.8.1 [OKAY] [NO]transformer torch cuda version ............... 11.1 ............ ...................[NO] transformer [OKAY]....... [NO] ............[OKAY] .......[NO] stochastic_transformer[OKAY].......stochastic_transformer nvcc version ..................... 11.2 .[OKAY]. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [NO]stochastic_transformer[NO] stochastic_transformer............... [OKAY] [OKAY] .[NO] .......[NO] [OKAY]....... deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 async_io ............... [NO] ....... [NO] [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils ..................utils [YES].................. ......[YES] [OKAY] ......quantizer [OKAY].............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY]-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO]utils ......................... [YES][OKAY] ...... [OKAY] utilsquantizer ................................ [YES][NO] ............. [OKAY][OKAY] --------------------------------------------------quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] .......[OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. /bin/sh: line 0: type: git: not found -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] utils ..................async_io async_io [YES] .............................. ......[NO][NO] [OKAY].............. [NO][NO] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------transformer_inference transformer_inference.. ..[NO] [NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] /bin/sh: line 0: type: git: not found quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system -------------------------------------------------- meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop name op name ................................................................ installedinstalledinstalledinstalled .. .... .. compatible compatiblecompatiblecompatible ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam cpu_adam...............cpu_adam............... [YES][YES].............................. ......[YES] ......[YES] ...... [OKAY][OKAY] [OKAY]...... [OKAY] fused_adamfused_adam fused_adam.............fused_adam............. .............[NO]............. [NO] .......[NO] [NO] ....... [OKAY] .......[OKAY] ....... [OKAY][OKAY] fused_lamb fused_lamb.............fused_lamb [NO].............fused_lamb............. ....... [NO]............. [NO] [OKAY]....... [NO].......[OKAY] .......[OKAY] [OKAY] sparse_attnsparse_attn ........................sparse_attn [NO] sparse_attn[NO] ............ ....... ...................[NO][OKAY] [NO] [OKAY] ....... ....... transformer[OKAY] transformer  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [OKAY]............ async_io ............... [NO] ....... [NO] ............ transformer[NO] transformer [NO] ....... ...................[OKAY]............ [OKAY][NO][NO] transformer_inference .. [NO] ....... [OKAY] ..............stochastic_transformer [OKAY]stochastic_transformer[OKAY] . [NO].stochastic_transformer .......stochastic_transformer[NO] . [OKAY] utils .................. [YES] ...... [OKAY] ........[NO] [OKAY][NO]....... .......[OKAY] [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------op nameop name op name ................................................ op name installedinstalled installed .................... ..installed compatiblecompatible.. --------------------------------------------------compatible-------------------------------------------------- compatible-------------------------------------------------- cpu_adam ............... cpu_adam[YES] -------------------------------------------------- ...............cpu_adam...... [YES]...............[OKAY] ......[YES] ......[OKAY] [OKAY] fused_adam ............. [NO] ....... cpu_adam[OKAY]fused_adam fused_adam............. [NO]fused_lamb............. ...................................[NO] [OKAY][NO]....... ....... [OKAY] fused_lamb[YES][OKAY] sparse_attn ............. ..................[NO]fused_lamb [NO].................... .......[OKAY][OKAY][NO] [OKAY]....... [OKAY] transformer ............ [NO] sparse_attn....... ............[OKAY] [NO]fused_adam sparse_attn ....... stochastic_transformer ............ [OKAY] ............. .[NO][NO] .......transformer[NO] .......[OKAY]............ ....... transformer[OKAY][NO] [OKAY] ................... [NO] fused_lamb[OKAY]....... [OKAY] .............stochastic_transformer .stochastic_transformer[NO] .......[NO] ........ [NO][OKAY] ....... [OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... transformer_inference[OKAY] .. [NO] ....... [OKAY]utils .................. [YES] ...... [OKAY]utils .................. [YES] ......quantizer [OKAY].............. [NO] .......quantizer [OKAY].............. [NO] ....... --------------------------------------------------[OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] async_io ............... [NO] ....... [NO] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inferenceutils .................... [NO][YES] ............. [OKAY][OKAY] quantizer utils.............. ..................[NO] [YES] ............. [OKAY][OKAY] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja .................. ....................................[OKAY].................. [OKAY]-------------------------------------------------- [OKAY] [OKAY] --------------------------------------------------op name -------------------------------------------------- ................ --------------------------------------------------op nameop name installed ................ op name................ .. installed ................installed compatible .. installed..-------------------------------------------------- compatible..compatible --------------------------------------------------compatible-------------------------------------------------- cpu_adam-------------------------------------------------- ............... [YES]cpu_adam cpu_adam ...... cpu_adam .............................. [OKAY] ............... [YES][YES] [YES] ............ [OKAY]......[OKAY]fused_adam [OKAY] ............. [NO] ....... [OKAY]fused_adamfused_adam fused_adam..........................fused_lamb .............[NO][NO]............. .......[NO]....... [NO][OKAY] .......[OKAY]....... fused_lamb [OKAY][OKAY] .............fused_lamb .............[NO]fused_lamb [NO]....... .............sparse_attn....... [OKAY][NO]............[OKAY] [NO]....... ....... [OKAY][OKAY] sparse_attn ............transformer sparse_attn[NO]............ ...................[NO]sparse_attn [OKAY].......[NO]............ [NO][OKAY]....... transformer ....... [OKAY] ............stochastic_transformer [OKAY] [NO]transformer. transformer ................... [OKAY][NO][NO] ............ ..............[NO]stochastic_transformer .......[OKAY][OKAY] . [OKAY] [NO] stochastic_transformer....... stochastic_transformer[OKAY]. [NO]. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninjaninjaninja .................. .................. .................. ..................[OKAY][OKAY] [OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------op nameop name op name ................ ................op name ................ installedinstalled ................installed.... compatiblecompatible..installed --------------------------------------------------compatible ..-------------------------------------------------- --------------------------------------------------compatible -------------------------------------------------- cpu_adam ............... [YES]cpu_adam cpu_adam.....................cpu_adam [OKAY]............... [YES]...............[YES] ......[YES] ...... [OKAY] ...... [OKAY]fused_adam [OKAY] ............. [NO] ....... [OKAY] fused_adam fused_lamb............. fused_adam.............fused_adam [NO] ............. .............[NO] [NO].......[NO] .......[OKAY]....... ....... [OKAY][OKAY] fused_lamb[OKAY] fused_lamb............. [NO].............fused_lamb .......[NO]............. sparse_attn[OKAY] ....... [NO]............ [OKAY] ....... [NO] [OKAY]....... [OKAY] sparse_attn ............transformer sparse_attn[NO]............ .......[NO]............ .......sparse_attn[OKAY] [NO] [OKAY] ................... transformer[OKAY][NO]stochastic_transformer .................... transformer[NO][OKAY][NO] .......................... transformer [OKAY] [OKAY] [NO]............ .......[NO]stochastic_transformer [OKAY]....... .[OKAY] stochastic_transformer[NO] ....... .[OKAY]stochastic_transformer [NO] ........ [NO][OKAY] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- async_io ............... [NO] ....... [NO] DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. transformer_inference .. [NO] ....... [OKAY] JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... ....... [NO][NO] ....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] /bin/sh: line 0: type: git: not found utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] utils .................. [YES] ...... utils[OKAY] transformer_inference .. [NO] ....... [OKAY] .................. [YES] quantizer...... ..............[OKAY] [NO] ....... [OKAY]quantizer .............. [NO] .......-------------------------------------------------- [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op nameop nameop name ................................ ................................ installedinstalledinstalled installed .. .. .... compatiblecompatiblecompatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- cpu_adam cpu_adamcpu_adam............... cpu_adam ............... [YES] .............................. ......[YES] ......[OKAY][YES] [YES] [OKAY]............ [OKAY][OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adamfused_adamfused_adam ..........................fused_lamb ..........................[NO] [NO] [NO][NO].............. ....... .......[OKAY][OKAY] [OKAY] [OKAY] fused_lamb fused_lamb............. fused_lamb.............[NO] .................... [NO][NO]sparse_attn[OKAY] .......................... [OKAY][NO][OKAY] ....... [OKAY] transformersparse_attn ........................ [NO]sparse_attn [NO] ....... sparse_attn............ ....... ............[OKAY] [NO][OKAY][NO] ....... stochastic_transformer[OKAY]....... transformer [OKAY]............ .transformer [NO][NO] ..........................transformer [NO][OKAY] [OKAY]................... stochastic_transformer[NO][OKAY] ....... .[OKAY] stochastic_transformer [NO] ........ stochastic_transformer [OKAY][NO] ....... .[OKAY] [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [YES] ...... [OKAY]quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] async_io ............... [NO] ....... [NO] -------------------------------------------------- -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. 1.8.1 torch cuda version ...............torch cuda version 11.1............... nvcc version11.1 async_io ............... [NO] ....... [NO] .....................nvcc version 11.2..................... deepspeed install path11.2 transformer_inference .. [NO] ....... [OKAY] ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info utils .................. [YES] ...... [OKAY] ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info: utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] utils....... ..................[OKAY] [YES] ...... [OKAY] utils ..................quantizer [YES].............. ......[NO] [OKAY] ....... [OKAY] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name op nameop name ................ ................................installed................ .. installed installedinstalled compatible ......-------------------------------------------------- compatiblecompatiblecompatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adamcpu_adam[YES] .................................... ............... [YES][YES] [OKAY] [YES]...... ...... ......[OKAY][OKAY] [OKAY] fused_adam .............fused_adam fused_adam[NO] fused_adam ............. ....... .......................... [NO] [OKAY] [NO].......[NO] fused_lamb.......[OKAY] ....... ............. [OKAY] [OKAY][NO] fused_lamb ....................fused_lamb [OKAY][NO]fused_lamb............. ....................[NO] [OKAY] [NO] ....... .......[OKAY] [OKAY] sparse_attn ............ [NO] .......sparse_attn [OKAY]............sparse_attnsparse_attn [NO]............transformer ............ ................... [NO] [OKAY] [NO][NO] ....... ....... transformer....... [OKAY]............[OKAY] [OKAY] [NO] transformer.......stochastic_transformer transformer[OKAY]............ . ............ [NO] [NO]stochastic_transformer[NO] ....... ............... [OKAY][NO][OKAY][OKAY] ....... [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]async_io ...............async_io [NO]............... .......[NO] [NO]....... [NO]transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO]transformer_inference utils ......... ..................[OKAY][NO] [YES]....... ......[OKAY] [OKAY]utils .................. [YES]quantizer utils ...... .............. .................. [OKAY] [NO] [YES] ............. [OKAY][OKAY] quantizer .............. quantizer--------------------------------------------------[NO] ..................... [NO][OKAY] ....... [OKAY]-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] ....... [OKAY]utils .................. [YES] ...... [OKAY] utils .................. [YES]quantizer .................... [OKAY][NO] ....... [OKAY] quantizer .............. [NO] --------------------------------------------------....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference .. ..[NO] [NO]....... .......[OKAY] [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [OKAY]quantizer async_io ............... [NO] ....... [NO] .............. [NO]quantizer ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch version .................... 1.8.1 transformer_inference .. [NO] ....... [OKAY] .............. [NO] .......-------------------------------------------------- [OKAY] torch cuda version ............... 11.1 async_ioutils ................................. [NO][YES] ............. [NO][OKAY] -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science quantizer .............. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... [NO]............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... [OKAY]quantizer quantizer .............. [NO] ....... [OKAY] .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] DeepSpeed general environment info: -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY] [NO] ....... --------------------------------------------------[OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version torch version............... ....................11.1 1.8.1nvcc version ..................... torch cuda version11.2 ...............deepspeed install path 11.1........... nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']..................... deepspeed info11.2 ...................deepspeed install path 0.4.2+bc17042, bc17042, big-science........... deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...... deepspeed infotorch 1.8, cuda 11.1 ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. **** Git info for Megatron: git_hash=unknown git_branch=unknown **** async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] utils .................. [YES] ...... [OKAY] torch version .................... 1.8.1 quantizer .............. [NO] ....... [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference .. [NO] ....... [OKAY] torch version .................... 1.8.1 utils .................. [YES] ...... [OKAY] torch cuda version ............... 11.1 quantizer .............. [NO] ....... [OKAY] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op nameop name op name ................op name................................ ................installed installed.. installedinstalled ..compatible .. .. compatible--------------------------------------------------compatible compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam ...............cpu_adam ............... [YES] ...............cpu_adam [YES] ...... [YES][OKAY]..................... ......[YES][OKAY] [OKAY]...... [OKAY] fused_adam ............. [NO] fused_adam .......fused_adam ............. [OKAY] ............. [NO] [NO].......fused_adamfused_lamb ............. .......[OKAY]............. [OKAY][NO][NO] fused_lamb..............fused_lamb [OKAY] .............[OKAY]............. fused_lamb[NO] [NO]............. .......[NO]....... [OKAY] ....... [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attntransformer sparse_attn ............ ........................[NO] [NO][NO]sparse_attn ....... ....... ....... [OKAY] [OKAY] [OKAY]............ transformer[NO] transformer............ ............[NO] stochastic_transformer....... [NO].......[OKAY]. ....... [OKAY] [NO] [OKAY] transformer .......stochastic_transformer stochastic_transformer[OKAY]............. [NO][NO] . ....... ....... [NO] [OKAY] [OKAY] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO]............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] .......utils [OKAY].................. [YES] ...... [OKAY] utils ..................quantizer [YES].............. ......[NO] [OKAY]....... [OKAY] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] ....... transformer_inference[OKAY] .. [NO] ....... [OKAY]utils .................. [YES] ......utils [OKAY].................. [YES] ......quantizer [OKAY].............. [NO] .......quantizer [OKAY].............. [NO] .......-------------------------------------------------- [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] DeepSpeed general environment info: -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- async_io ............... [NO] ....... [NO] op nameop name op nameop name ................ ................ ................................ installed installed installedinstalled .. .. .... compatible compatiblecompatible compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 utils .................. [YES] ...... [OKAY] cpu_adamcpu_adam cpu_adamcpu_adam ............... ............... ............... [YES]............... [YES] [YES][YES] ...... ...... ............ [OKAY] [OKAY][OKAY][OKAY] torch cuda version ............... 11.1 quantizer .............. [NO] ....... [OKAY] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- fused_adam .............fused_adamfused_adamfused_adam [NO]....................................... .......[NO][NO][NO] [OKAY]....... ....... ....... [OKAY] [OKAY] [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science fused_lamb deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_lambfused_lamb............. .............[NO].............fused_lamb .......[NO]............. [NO] .......[OKAY][NO]....... [OKAY].......[OKAY] async_io ............... [NO] ....... [NO] [OKAY] transformer_inference .. [NO] ....... [OKAY] sparse_attn sparse_attn............sparse_attn ............sparse_attn[NO] ...............................[NO] [NO] [OKAY] [NO]....... .............. [OKAY][OKAY]transformer[OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] ............transformer transformertransformer[NO] ............ ............ ............ ....... [NO][NO] [NO] [OKAY]....... -------------------------------------------------- ....... ....... [OKAY][OKAY][OKAY] stochastic_transformer stochastic_transformer. stochastic_transformerstochastic_transformer[NO] . ........[NO]. [NO][OKAY].......[NO] .......[OKAY]....... [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... async_io[NO]async_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference transformer_inference.. ..[NO] utils.......[NO] ..................[OKAY]....... [YES] [OKAY]...... [OKAY] utils ..................utils quantizer [YES]................................ ......[YES][NO] [OKAY]...... ....... [OKAY][OKAY] quantizer .............. [NO]quantizer -------------------------------------------------- ....... .............. [OKAY][NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja-------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................. [OKAY].................. [OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name -------------------------------------------------- op nameop name ................ ................ op name................installed installed ................installed.... ..compatible installed compatible compatible--------------------------------------------------.. ----------------------------------------------------------------------------------------------------compatible -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam [YES] cpu_adam ..................... ..............................[OKAY] [YES] [YES] [YES] ...... ...... ...... [OKAY] [OKAY] [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adamfused_adamfused_adam ............. fused_lamb .......................... [NO]............. [NO][NO] ....... [NO]....... ....... [OKAY]....... [OKAY][OKAY][OKAY] fused_lamb .............fused_lamb fused_lamb [NO] ............. ............. ....... [NO] sparse_attn[NO][OKAY] .......................... [NO][OKAY][OKAY] ....... [OKAY] sparse_attntransformer ........................ [NO][NO] .............. sparse_attn[OKAY] sparse_attn [OKAY] ............ ............transformer stochastic_transformer [NO]............[NO] [NO] ............... ....... [OKAY][OKAY][NO][OKAY] .......transformer transformerstochastic_transformer[OKAY] ........................ .[NO][NO] .......[NO]....... .......[OKAY][OKAY] [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version .................... 1.8.1torch version .................... torch cuda version1.8.1 ............... torch cuda version11.1 ...............nvcc version 11.1..................... nvcc version11.2 .....................deepspeed install path 11.2........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ................... deepspeed info0.4.2+bc17042, bc17042, big-science ................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... async_io[OKAY] ............... [NO]-------------------------------------------------- ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name op name op name ................op name................ ................................installedinstalled installed..installed .. compatible ..--------------------------------------------------..compatible compatiblecompatible -------------------------------------------------- -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. cpu_adam ............... [YES] cpu_adamcpu_adam......cpu_adam ...............[OKAY].............................. [YES][YES][YES] .................. [OKAY][OKAY][OKAY] async_io ............... [NO] ....... [NO] fused_adam ............. [NO] ....... fused_adamfused_adam[OKAY] fused_adam transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.utils .......................... .............[NO] [NO] fused_lamb [NO]....... ....... .................... [OKAY] [OKAY] [NO][OKAY] .................. [YES] ...... [OKAY] .......fused_lamb [OKAY]fused_lamb.............  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_lamb .............[NO]............. [NO][NO]....... ..............[OKAY] [OKAY] [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. sparse_attn ............ [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer sparse_attnsparse_attn............ sparse_attn ........................[NO] [NO] ............[NO]....... ....... [NO] [OKAY] [OKAY]....... async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] .......utils [OKAY].................. ....... [OKAY][OKAY] transformer_inference .. [NO] ....... [OKAY] [YES] ...... [OKAY] stochastic_transformertransformertransformer transformer ............ ............. ............ [NO] [NO] [NO][NO]....... ....... ....... .......[OKAY] [OKAY] [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES]quantizer .................... [OKAY][NO] [OKAY] quantizer .............. [NO] ....... [OKAY] ....... quantizer[OKAY] .............. [NO] .......-------------------------------------------------- [OKAY] stochastic_transformer stochastic_transformer.stochastic_transformer [NO]. ........[NO] [NO][OKAY]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed install path................... ...........0.4.2+bc17042, bc17042, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES] [YES]...... ......[OKAY] [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. /bin/sh: line 0: type: git: not found async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info: utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] DeepSpeed general environment info: -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................................... .................................... [OKAY] [OKAY][OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................................................................ installed installedinstalled installed .... .. .. compatible compatiblecompatiblecompatible-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed general environment info: cpu_adam ............... cpu_adamcpu_adam[YES] cpu_adam ............... .................................... [YES][YES][YES][OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 ...... ...... ...... [OKAY] [OKAY] [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 fused_adam ............. [NO]fused_adam fused_adamfused_adam ....... ............. .......................... [NO][OKAY][NO] .......[NO]....... fused_lamb.......[OKAY] [OKAY] ............. DeepSpeed general environment info: [OKAY] fused_lamb [NO]fused_lamb ....................fused_lamb............. [NO] [NO] .............[OKAY] ....... ....... [NO] [OKAY] [OKAY] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 sparse_attn ............sparse_attnsparse_attn sparse_attn [NO]........................ [NO]................... [NO] [NO] .......[OKAY] ....... ....... [OKAY] torch cuda version ............... 11.1 [OKAY][OKAY]transformer nvcc version ..................... 11.2 transformer............ transformer ............ transformer[NO] [NO]............................... .......[NO][OKAY][NO] [OKAY].............. [OKAY][OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] DeepSpeed general environment info: stochastic_transformerstochastic_transformer stochastic_transformerstochastic_transformer. . [NO] .. [NO] [NO] ....... [NO] .............. [OKAY] ....... [OKAY] [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils .................. [YES] ...... utils[OKAY] .................. [YES] ......quantizer [OKAY].............. [NO] .......quantizer [OKAY].............. [NO] .......-------------------------------------------------- [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info:DeepSpeed general environment info: torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science torch cuda versiontorch cuda version .............................. 11.111.1 deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version ............... ...............11.1 11.1 ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... quantizer[OKAY] .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY][OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- op name op name --------------------------------------------------................op name DeepSpeed general environment info: ................op name................installed ................ ..installed installed installed ..compatible .. compatible--------------------------------------------------..compatible ----------------------------------------------------------------------------------------------------compatible torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- torch version .................... 1.8.1DeepSpeed general environment info: torch cuda version cpu_adam ............... cpu_adam[YES] cpu_adam..................... ............... cpu_adam[OKAY] [YES] ............... 11.1 [YES] ............... ...... ...... [YES] [OKAY] [OKAY]fused_adam...... .............[OKAY] [NO] ....... [OKAY]fused_adam nvcc version torch install path..................... ...............11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] fused_adam ..........................fused_lamb [NO]fused_adam[NO]............. ....... ....................[NO] [OKAY][OKAY].......[NO] [OKAY]....... deepspeed info ...................torch version 0.4.2+bc17042, bc17042, big-science.................... fused_lamb fused_lamb [OKAY] .......................... deepspeed wheel compiled w.1.8.1 ...... torch 1.8, cuda 11.1torch cuda version [NO][NO] fused_lamb.......sparse_attn....... ............[OKAY] ............. [NO] [OKAY] [NO] ............... 11.1 nvcc version ..................... 11.2 ....... .......[OKAY] [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] transformersparse_attn ........................ [NO][NO] sparse_attn ....... ....... sparse_attn[OKAY]............ [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science ............[NO] [NO]transformerstochastic_transformer....... ....... ............ [OKAY] [OKAY] .[NO] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 [NO]....... transformer.......transformer[OKAY] [OKAY] ............ ............[NO] [NO]....... stochastic_transformer ....... [OKAY] [OKAY]. [NO] .......stochastic_transformer stochastic_transformer [OKAY] . .[NO] [NO]....... .......[OKAY] [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja .................. .................. .................. [OKAY].................. [OKAY] [OKAY] [OKAY]---------------------------------------------------------------------------------------------------- --------------------------------------------------op nameop name ................................op name installedinstalled................ ....--------------------------------------------------installed compatiblecompatible..op name compatible---------------------------------------------------------------------------------------------------- ................ --------------------------------------------------installed cpu_adam ...............cpu_adam.. [YES]...............cpu_adamcompatible ......[YES]............... [OKAY]...... [YES] --------------------------------------------------......[OKAY] [OKAY] fused_adam ............. [NO] .......fused_adam cpu_adam [OKAY]fused_adam ............. .............[NO] fused_lamb [NO]............... ....... ....................[YES] [OKAY] [OKAY] [NO] .............fused_lambfused_lamb [OKAY][OKAY]............. ............. DeepSpeed general environment info: [NO][NO] .............. [OKAY][OKAY] sparse_attn fused_adam............ [NO]sparse_attn sparse_attn ............................................ [NO][OKAY][NO][NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .......transformer.............. ............[OKAY][OKAY] [NO] [OKAY] transformertransformer....... ........................[OKAY]fused_lamb torch version .................... 1.8.1 ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------op nameop name [NO][NO] .............. stochastic_transformer.............[OKAY][OKAY] torch cuda version ............... 11.1 ................op name ................op name installed ................ installedinstalled.................. ....installedcompatible [NO]. stochastic_transformer[NO]stochastic_transformer ............... .[OKAY][NO] [OKAY][NO]....... nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] compatiblecompatible -------------------------------------------------- ..-------------------------------------------------- -------------------------------------------------- compatible -------------------------------------------------- .......[OKAY] [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 cpu_adam ...............cpu_adam cpu_adam [YES] .................................... [YES]cpu_adam[YES][OKAY] ........................... [OKAY][YES][OKAY] ...... [OKAY]fused_adam sparse_attn ............ [NO] ....... [OKAY] ............. [NO]fused_adam fused_adam.................... [OKAY].............[NO] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] fused_adam[NO]fused_lamb....... ....... ..........................[OKAY] [OKAY] [NO] [NO] .......fused_lamb fused_lamb [OKAY] .................... ............. [NO][NO] [OKAY] ....... ....... [OKAY][OKAY] sparse_attn fused_lamb ............ [NO]............. ....... [NO][OKAY] sparse_attn.......transformer sparse_attn ............ ............ [OKAY][NO]............[NO] ....... [NO].......[OKAY] [OKAY].......transformer ............[OKAY] stochastic_transformer[NO] ........sparse_attn transformer[OKAY][NO] ............................... stochastic_transformer[NO][OKAY][NO] ....... ........[OKAY] [OKAY][NO] .......stochastic_transformer [OKAY]transformer . [NO]............ ....... [NO][OKAY] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] DeepSpeed general environment info: torch install pathDeepSpeed general environment info: ............... DeepSpeed general environment info: torch install path ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch cuda version torch version............... ....................11.1 1.8.1 nvcc version .....................torch cuda version 11.2............... deepspeed install path11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] ...........nvcc version ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inferenceasync_io .. ...............[NO] [NO]....... .......[OKAY] [NO] deepspeed infodeepspeed install path .............................. 0.4.2+bc17042, bc17042, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. deepspeed info...... ...................torch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference quantizer.. [NO].............. .......[NO] [OKAY]....... [OKAY] utils-------------------------------------------------- .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: utils .................. [YES] ...... [OKAY] torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1 quantizer .............. [NO] ....... [OKAY] nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info -------------------------------------------------- deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name-------------------------------------------------- op name deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] op name op name ................ ................ ................installed................ installed..installedinstalled ....compatible.. compatible compatible--------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 cpu_adam ............... [YES]cpu_adamcpu_adamcpu_adam ..................... .............................. [OKAY] [YES][YES][YES] .................. [OKAY][OKAY][OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adamfused_lamb fused_adamfused_adam .......................... ............. .............[NO][NO][NO] ....... [NO] ....... ....... [OKAY].......[OKAY] [OKAY][OKAY] fused_lamb ............. fused_lambfused_lamb[NO] ............. ............. .......sparse_attn[NO] [NO][OKAY]................... .......[OKAY][NO] .......[OKAY] [OKAY] transformersparse_attn ............ ............[NO]sparse_attn sparse_attn...................[NO] [NO]............[OKAY] ....... ....... [OKAY][NO]stochastic_transformer [OKAY] .......transformer . ............ [OKAY]transformer [NO] [NO]............ ....... transformer....... [NO] [OKAY] ................... [OKAY] [NO] [OKAY] ....... stochastic_transformer[OKAY] stochastic_transformer . .[NO]stochastic_transformer [NO]....... . ....... [OKAY] [NO][OKAY] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- DeepSpeed general environment info: JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info: torch version .................... 1.8.1 torch cuda version ............... 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch version .................... 1.8.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 DeepSpeed general environment info: nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version ............... 11.1 nvcc version torch cuda version..................... ...............11.2 11.1 deepspeed install pathnvcc version ................................ 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install pathdeepspeed info .............................. 0.4.2+bc17042, bc17042, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. deepspeed info...... ...................torch 1.8, cuda 11.1 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] DeepSpeed general environment info: utils .................. [YES] ......utils [OKAY].................. [YES] quantizer...... ..............[OKAY] [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] quantizer .............. [NO]-------------------------------------------------- ....... [OKAY] -------------------------------------------------- torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. torch cuda version ............... 11.1 nvcc version ..................... 11.2 -------------------------------------------------- JIT compiled ops requires ninja deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science DeepSpeed general environment info: deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY] nvcc version ..................... 11.2 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science op nameop nameop nameop name ................................................................ installedinstalledinstalled installed .. .. ..compatible.. compatible compatible --------------------------------------------------compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. cpu_adam ...............cpu_adam cpu_adam[YES]...............cpu_adam [YES] ..................... ............... ......[OKAY][YES] [YES] [OKAY]...... ...... [OKAY][OKAY] async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] fused_adam ............. [NO]fused_adam fused_adam....... fused_adam.......................... [OKAY][NO].............[NO] .......[NO] .......fused_lamb[OKAY] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] ....................[OKAY]fused_lamb [NO] [OKAY]............. utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] fused_lamb[NO].......fused_lamb .................... [OKAY] ............. [OKAY][NO][NO] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] .............. [OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- sparse_attn ............ sparse_attn[NO] ................... sparse_attn[NO] [OKAY]sparse_attn....... ............ ............transformer[OKAY][NO] [NO] ....... ............ ....... transformer[OKAY][OKAY][NO] ...................transformer transformer [OKAY][NO] ............ ............ ....... [NO]stochastic_transformer [NO] [OKAY] ....... ....... . [OKAY] [OKAY][NO] stochastic_transformer .......stochastic_transformer stochastic_transformer .[OKAY] .[NO] . [NO].......[NO] ....... [OKAY] ....... [OKAY] [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op nameop name................op name ................................installed ................ installedinstalled .... installedcompatible..compatible ..--------------------------------------------------compatible-------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam ..............................cpu_adam [YES][YES]cpu_adam............... ..................... ......[YES][OKAY] [YES] [OKAY] ...... ...... [OKAY][OKAY] fused_adam ............. fused_adam[NO] .............fused_adam fused_adam[NO]....... [OKAY]................................. [OKAY][NO] [NO]fused_lamb .......fused_lamb.................... [OKAY].............[NO] [OKAY] .......[NO] fused_lamb[OKAY] ....... .............[OKAY]fused_lamb [NO]............. .......[NO] [OKAY]....... [OKAY]sparse_attn ............ sparse_attn[NO] ................... [NO][OKAY]sparse_attn ....... ............transformer[OKAY]sparse_attn [NO]........................ transformer .......[NO] [NO] ................... [OKAY] [OKAY]....... [NO] [OKAY].......transformer stochastic_transformer [OKAY] ............. transformer [NO]stochastic_transformer [NO] ............ ........ ....... [NO][OKAY][NO][OKAY] ....... .......[OKAY] stochastic_transformer[OKAY] . [NO]stochastic_transformer ....... [OKAY]. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed general environment info: ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 --------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name op name ................ op name................ ................ installed ................ installedinstalled .. installed.... compatiblecompatible..compatible compatible-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adam cpu_adamcpu_adam...............cpu_adam ...............[YES].............................. ......[YES][YES][YES] [OKAY]...... ............[OKAY] [OKAY][OKAY] fused_adam fused_adam.............fused_adamfused_adam [NO]....................................... .......[NO][NO][NO] [OKAY].............. ....... [OKAY] [OKAY][OKAY] fused_lamb fused_lamb............. fused_lamb.............fused_lamb[NO] [NO].......................... ....... ....... [NO][NO][OKAY][OKAY] .............. [OKAY][OKAY] sparse_attnsparse_attn ............sparse_attn............ sparse_attn [NO][NO] ............ ............ ..............[NO][NO] [OKAY].......[OKAY]....... [OKAY] [OKAY] transformer transformer transformer ............ ............ transformer............ [NO] [NO] [NO] ............ .............. ....... [NO] [OKAY] [OKAY] [OKAY]....... [OKAY]stochastic_transformer stochastic_transformerstochastic_transformer . stochastic_transformer..[NO] [NO] .[NO] ....... ....... .......[NO][OKAY][OKAY] [OKAY]....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: ninjaninjaninjaninja .................. .................. .................................... [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------op name op name................op name op name ................ installed................ ................ installed..installed installed....compatible torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................DeepSpeed general environment info: 1.8.1 ..compatiblecompatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- compatible -------------------------------------------------- cpu_adamcpu_adam cpu_adam............... ..............................[YES]cpu_adam [YES]...............[YES]...... ......[YES][OKAY]...... torch cuda version ............... 11.1torch install path nvcc version............... ..................... 11.2 deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infotorch version ....................................... 0.4.2+bc17042, bc17042, big-science1.8.1 [OKAY][OKAY]...... [OKAY] deepspeed wheel compiled w. torch cuda version...... ...............torch 1.8, cuda 11.1 11.1 nvcc version ..................... 11.2 DeepSpeed general environment info:DeepSpeed general environment info: fused_adam ............. fused_adamfused_adam[NO] .............fused_adam.................... [NO] [NO]............. [OKAY] ....... deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ....... [NO] [OKAY] .......[OKAY]fused_lamb [OKAY]............. deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 fused_lamb fused_lamb[NO]............. fused_lamb....................[NO] .......[OKAY] [NO] ............. [OKAY] ....... [NO] [OKAY]....... torch cuda versiontorch cuda version .............................. 11.111.1 [OKAY] nvcc versionnvcc version .......................................... 11.211.2 sparse_attn ............sparse_attn [NO]............ .......[NO] sparse_attn[OKAY] sparse_attn ....... ............ ............ transformer[OKAY] [NO] ............[NO]....... transformer [NO] ...................[OKAY] deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science [OKAY].......[NO] transformer [OKAY] transformer....... deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 ............ ............ stochastic_transformer [OKAY][NO][NO] ............... stochastic_transformer[NO] [OKAY] [OKAY] ........ [NO][OKAY] stochastic_transformerstochastic_transformer....... [OKAY]. . [NO] .......[NO] [OKAY]....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version DeepSpeed general environment info:............... 11.1 nvcc version ..................... torch install path11.2 deepspeed install path............... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']................... 0.4.2+bc17042, bc17042, big-science torch versiondeepspeed wheel compiled w. .......................... 1.8.1torch 1.8, cuda 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. [NO] ......... [OKAY][NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] --------------------------------------------------....... [OKAY] /bin/sh: line 0: type: git: not found --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ...............DeepSpeed general environment info: 11.1 nvcc version ..................... 11.2 deepspeed install path torch install path........... ...............['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ...... torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 utils .................. [YES] ...... [OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 quantizer .............. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference ..async_io [NO] utils...................... ..................[NO][OKAY] [YES]....... ......[NO] [OKAY]utils .................. quantizer[YES] .................... [NO][OKAY] .......transformer_inference [OKAY]quantizer.. ..............[NO] --------------------------------------------------[NO]....... .......[OKAY] [OKAY] --------------------------------------------------utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io ............... [NO] ....... [NO] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference .. [NO] ....... [OKAY] torch version .................... 1.8.1 utils .................. [YES] ...... [OKAY] torch cuda version ............... 11.1 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report quantizer .............. [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 DeepSpeed general environment info: torch cuda version ............... 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science torch version .................... 1.8.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninja ninja .................................... .................. .................. [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- op name---------------------------------------------------------------------------------------------------- ................op name op nameop name installed .................................................. installedcompatible installed installed..-------------------------------------------------- ....compatible compatible compatible ----------------------------------------------------------------------------------------------------cpu_adam -------------------------------------------------- ............... [YES] ...... [OKAY] cpu_adamcpu_adam cpu_adam .............................. ...............[YES][YES]fused_adam [YES] ......................... ...... [OKAY] [NO] [OKAY][OKAY] ....... [OKAY] fused_lambfused_adam .......................... [NO]fused_adam[NO] fused_adam....... .................................[OKAY] [OKAY] [NO] [NO] .......fused_lamb....... [OKAY] ............. [OKAY] [NO]sparse_attnfused_lamb ...................fused_lamb [OKAY] .......................... [NO] [NO][NO]....... .......[OKAY]....... [OKAY][OKAY]sparse_attn transformer............ ............[NO] [NO]....... .......[OKAY] [OKAY] transformersparse_attnstochastic_transformersparse_attn ............ ............. ............ [NO][NO] [NO][NO] ....... ..................... [OKAY][OKAY][OKAY][OKAY] transformertransformerstochastic_transformer ......................... [NO][NO][NO] ..................... [OKAY][OKAY] [OKAY] stochastic_transformer . stochastic_transformer[NO] ........ [OKAY][NO] ....... [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found ninjaninjaninjaninja .................. ......................................................[OKAY] [OKAY][OKAY][OKAY]-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name **** Git info for Megatron: git_hash=unknown git_branch=unknown **** op nameop name ................op name ................ ................installed................ installed..installedinstalled .. compatible.. .. compatible-------------------------------------------------- compatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adam ............... [YES] ......cpu_adam cpu_adam[OKAY]cpu_adam............... ..............................[YES] [YES] ......[YES]...... [OKAY]fused_adam...... [OKAY] .............[OKAY] [NO] ....... [OKAY] fused_adam ............. fused_adamfused_lamb[NO] fused_adam.......................... ....... .............[NO][OKAY] [NO] ....... [NO] fused_lamb.......[OKAY] ............. ....... [OKAY] [NO] [OKAY]....... fused_lamb[OKAY] /bin/sh: line 0: type: git: not found fused_lamb............. sparse_attn.............[NO] ............[NO]....... [NO] .......[OKAY].......sparse_attn [OKAY][OKAY]............ [NO] transformer....... ............[OKAY] [NO] sparse_attn....... [OKAY]transformer............ sparse_attn ............ [NO] ............stochastic_transformer [NO] [NO].............. . .......[OKAY][OKAY][NO] [OKAY]....... stochastic_transformer transformer [OKAY]transformer . ............ ............ [NO] [NO] [NO]....... ..............[OKAY] [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1torch version ....................torch cuda version 1.8.1............... 11.1torch cuda version nvcc version............... .....................11.1 11.2nvcc version deepspeed install path..................... ...........11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path DeepSpeed general environment info: ...........deepspeed info ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1DeepSpeed general environment info: torch cuda version ............... 11.1 nvcc versiontorch install path ..................... 11.2............... /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed info ................... torch version0.4.2+bc17042, bc17042, big-science ....................deepspeed wheel compiled w. 1.8.1...... torch 1.8, cuda 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] utils....... ..................[NO] [YES] ...... [OKAY] quantizer .............. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY]-------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version .....................nvcc version 11.2..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found ...................deepspeed info ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 nvcc version11.1 .....................nvcc version 11.2..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] transformer_inferencetransformer_inference .. [NO] ......... [OKAY][NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... [OKAY]quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inferenceutils .................... [NO][YES] ............. [OKAY][OKAY] quantizer utils.............. ..................[NO] [YES]....... ......[OKAY] [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 /bin/sh: line 0: type: git: not found nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name op nameop name ................................ ................ ................installed installed installedinstalled .. .... .. compatiblecompatiblecompatible ----------------------------------------------------------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adamcpu_adam...............cpu_adam ............... ...............[YES]............... ......[YES][YES][YES] [OKAY] ...... ............ [OKAY][OKAY] [OKAY] fused_adam ............. fused_adamfused_adam[NO] fused_adam .................... ..........................[NO] [OKAY] [NO] /bin/sh: line 0: type: git: not found [NO]....... ....... ....... fused_lamb [OKAY] [OKAY][OKAY] ............. [NO] .......fused_lambfused_lamb fused_lamb [OKAY] ............. ............. ............. [NO] [NO] [NO] ....... ..............[OKAY] [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn transformersparse_attn............sparse_attn [NO].................................... [NO].......[NO][NO] ....... .......[OKAY] ....... [OKAY] [OKAY]transformer[OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ............ transformer[NO]stochastic_transformer transformer ................... . ............[NO][OKAY][NO] [NO].............. .......stochastic_transformer[OKAY] [OKAY] [OKAY] . [NO]stochastic_transformer stochastic_transformer ....... .[OKAY] . [NO] [NO]....... .......[OKAY] [OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`................ [NO] ....... [NO] async_io ............... [NO] transformer_inference....... ..[NO] async_io[NO] ...................... [OKAY][NO] ....... [NO] transformer_inference utils.. ..................[NO] [YES]....... ......[OKAY] [OKAY] transformer_inference quantizer.. ..............[NO]utils [NO]......................... .......[OKAY][YES] [OKAY]...... [OKAY] --------------------------------------------------utils quantizer.................. ..............[YES] [NO]...... .......[OKAY] [OKAY] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch version .................... 1.8.1 torch cuda version ............... 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science /bin/sh: line 0: type: git: not found deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-scienceDeepSpeed general environment info: deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch version .................... 1.8.1 async_io ............... [NO] ....... [NO] torch cuda version ............... 11.1 transformer_inference .. [NO] ....... [OKAY] nvcc version ..................... 11.2 utils .................. [YES] ...... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] quantizer .............. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1torch version .................... torch cuda version1.8.1 ............... 11.1torch cuda version nvcc version............... .....................11.1 11.2nvcc version deepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install pathdeepspeed info .............................. 0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed wheel compiled w.deepspeed info ......................... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... [NO]............... [NO] ....... [NO] transformer_inference ..transformer_inference [NO].. [NO]....... .......[OKAY] [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference utils.. ..................[NO] [YES]....... ......[OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer utils.............. [NO].................. .......[YES] [OKAY]...... [OKAY] -------------------------------------------------- DeepSpeed general environment info: quantizer .............. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] -------------------------------------------------- torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] /bin/sh: line 0: type: git: not found transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch version .................... 1.8.1 async_io ............... [NO] ....... [NO] torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] transformer_inference .. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utils .................. [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... [OKAY]utils .................. [YES] ...... [OKAY]utils .................. [YES] quantizer...... [OKAY].............. [NO] .......quantizer [OKAY].............. [NO] ....... --------------------------------------------------[OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 using world size: 256, data-parallel-size: 8, tensor-model-parallel size: 4, pipeline-model-parallel size: 8 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science using torch.float16 for parameters ... deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ------------------------ arguments ------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] accumulate_allreduce_grads_in_fp32 .............. False transformer_inference .. [NO] ....... [OKAY] adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 utils .................. [YES] ...... [OKAY] adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 quantizer .............. [NO] ....... [OKAY] biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True -------------------------------------------------- checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/codecarbon consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False data_impl ....................................... mmap data_parallel_size .............................. 8 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1271196.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 5 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 20480 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False global_batch_size ............................... 2048 hidden_dropout .................................. 0.1 hidden_size ..................................... 16384 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 512 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 10 log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ 126953125 lr_decay_style .................................. cosine lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 216320 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 32 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 32 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 8 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['16', '16', '6_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 43 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 300000000 use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 256 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 16 to global batch size 2048 with batch size increments 16 over 6000000 samples. > building GPT2BPETokenizer tokenizer ... DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version .................... 1.8.1.................... 1.8.1torch cuda version ............... torch cuda version11.1 ...............nvcc version 11.1..................... nvcc version11.2 .....................deepspeed install path 11.2........... deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info .............................. 0.4.2+bc17042, bc17042, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. deepspeed info...... ...................torch 1.8, cuda 11.1 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install pathDeepSpeed general environment info: ............... torch install path['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ...............torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch cuda version ...............torch version 11.1.................... nvcc version1.8.1 ..................... torch cuda version11.2 ...............deepspeed install path 11.1........... nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']..................... deepspeed info11.2 ...................deepspeed install path 0.4.2+bc17042, bc17042, big-science........... deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...... deepspeed infotorch 1.8, cuda 11.1 ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version DeepSpeed general environment info:............... 11.1 nvcc version ..................... 11.2torch install path deepspeed install path .......................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ...................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.torch version .......................... torch 1.8, cuda 11.11.8.1 DeepSpeed general environment info:torch cuda version ............... 11.1 nvcc version .....................torch install path 11.2............... deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed info ................... torch version0.4.2+bc17042, bc17042, big-science .................... deepspeed wheel compiled w.1.8.1 ...... torch cuda versiontorch 1.8, cuda 11.1 ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 DeepSpeed general environment info: DeepSpeed general environment info: torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'].................... 1.8.1 torch version torch cuda version.................... ...............1.8.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 11.1torch cuda version nvcc version............... .....................11.1 11.2nvcc version deepspeed install path..................... ...........11.2 deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...........deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-sciencedeepspeed info deepspeed wheel compiled w.................... ......0.4.2+bc17042, bc17042, big-science torch 1.8, cuda 11.1deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install pathDeepSpeed general environment info: ............... torch install path['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch cuda version ...............torch version 11.1.................... nvcc version1.8.1 ..................... 11.2torch cuda version deepspeed install path............... ...........11.1 nvcc version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] .....................deepspeed info 11.2................... deepspeed install path0.4.2+bc17042, bc17042, big-science ...........deepspeed wheel compiled w. ......['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ......quantizer [OKAY].............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY]-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO]............... .......[NO] [NO]....... [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ................... ................... 0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report--------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']DeepSpeed general environment info: torch version .................... 1.8.1 torch install path torch cuda version............... ............... 11.1 nvcc version ..................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']11.2 deepspeed install path torch version........... ....................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 1.8.1 deepspeed info ...................torch cuda version 0.4.2+bc17042, bc17042, big-science ............... deepspeed wheel compiled w.11.1 ...... nvcc versiontorch 1.8, cuda 11.1 ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninja ninja ...................................................... .................. [OKAY][OKAY] [OKAY]--------------------------------------------------[OKAY]-------------------------------------------------- ----------------------------------------------------------------------------------------------------op nameop name op name ................op name ................ ................................installedinstalled installedinstalled.. .. .. ..compatiblecompatiblecompatible compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam [YES] cpu_adam............... ............... ...... ...............[YES][YES] [OKAY] [YES] ............ [OKAY]......[OKAY] fused_adam[OKAY] ............. [NO] ....... fused_adam[OKAY] fused_adam............. .............fused_adam fused_lamb .............[NO][NO] ............. [NO] .............. [NO] ....... [OKAY] [OKAY] ....... [OKAY] fused_lamb[OKAY]fused_lamb ............. fused_lamb ............. [NO] ............. [NO] .......[NO]....... sparse_attn .......[OKAY][OKAY] [OKAY]............ [NO] ....... [OKAY] transformer ............ [NO]sparse_attnsparse_attn sparse_attn............................... ............[NO] [NO][OKAY] [NO] ....... .......stochastic_transformer ....... [OKAY][OKAY]. [OKAY] transformer[NO] transformer ................... transformer[NO]............ [OKAY]............ .......[NO] [NO][OKAY]....... .......[OKAY] stochastic_transformer[OKAY] .stochastic_transformer stochastic_transformer[NO] ........ . [NO] [OKAY] .......[NO] [OKAY]....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1torch version ....................torch cuda version 1.8.1............... 11.1 torch cuda versionnvcc version ............... 11.1 nvcc version ..................... 11.2 .....................deepspeed install path 11.2........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ...................deepspeed install path 0.4.2+bc17042, bc17042, big-science........... deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [NO] ....... [OKAY] -------------------------------------------------- [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO]transformer_inference ......... [NO][NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** utils .................. transformer_inference[YES] ........ [NO][OKAY] ....... [OKAY] quantizer .............. [NO] .......utils [OKAY].................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1DeepSpeed general environment info: nvcc version ..................... 11.2 DeepSpeed general environment info: deepspeed install pathtorch install path ........... ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed info ................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 0.4.2+bc17042, bc17042, big-science torch versiondeepspeed wheel compiled w. .......................... 1.8.1torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'].................... 1.8.1 torch version torch cuda version.................... ...............1.8.1 11.1 nvcc versiontorch cuda version .................................... 11.211.1 deepspeed install pathnvcc version ................................ 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install pathdeepspeed info .............................. 0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1torch install path torch cuda version............... ............... 11.1 nvcc version ..................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']11.2 deepspeed install pathtorch version ............................... 1.8.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infotorch cuda version .................................. 0.4.2+bc17042, bc17042, big-science11.1 deepspeed wheel compiled w.nvcc version ........................... torch 1.8, cuda 11.111.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info: torch version .................... 1.8.1 torch cuda version ............... 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info: torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... DeepSpeed general environment info:['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.torch install path ...... torch 1.8, cuda 11.1............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install pathtorch install path ............... ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version torch version........................................ ....................1.8.11.8.1 1.8.1 torch cuda versiontorch cuda version torch cuda version ............... ............... ............... 11.1 11.1 11.1 nvcc versionnvcc version nvcc version ..................... ..................... ..................... 11.2 11.2 11.2 deepspeed install path deepspeed install path deepspeed install path ........... ........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed infodeepspeed info................... ......................................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w.deepspeed wheel compiled w....... ............torch 1.8, cuda 11.1 torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc versionDeepSpeed general environment info: ..................... 11.2 deepspeed install path ........... torch install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...............deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ...... torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path ...........deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info ...................deepspeed info ...................0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninja ninja ...................................................... [OKAY][OKAY][OKAY] ..................---------------------------------------------------------------------------------------------------- -------------------------------------------------- [OKAY] op nameop name op name ................-------------------------------------------------- ................ ................ installed op nameinstalled installed .................. .... compatible installed compatiblecompatible ..-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam...............compatiblecpu_adam ...............--------------------------------------------------...............[YES] ......[YES] [OKAY]......[YES] [OKAY]...... [OKAY] cpu_adam ............... [YES] ......fused_adam [OKAY].............fused_adamfused_adam .............[NO]............. .......[NO][NO] [OKAY]....... ....... [OKAY][OKAY] fused_adamfused_lamb .......................... fused_lamb [NO][NO] fused_lamb ............. ...........................[NO] [OKAY][OKAY].......[NO] ....... [OKAY][OKAY] fused_lamb ............. [NO] ....... sparse_attn[OKAY] ............ [NO] .......sparse_attn sparse_attn ............ [OKAY]............ [NO][NO]transformer sparse_attn.......................... [OKAY][NO][OKAY] ............ .......transformer transformer[OKAY][NO]............ ............ [NO] .......[NO]stochastic_transformer....... ........[OKAY] [OKAY][NO] [OKAY] .......stochastic_transformer [OKAY]transformer stochastic_transformer . ............ .[NO] .......[NO][NO] [OKAY].............. [OKAY] [OKAY] stochastic_transformer . [NO] ....... [OKAY] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 DeepSpeed general environment info:nvcc version ..................... 11.2 deepspeed install path ...........torch install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']............... deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']...... torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................................... .................. ..................[OKAY] [OKAY] [OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name op nameop name ................................ installed................................installed ..installedinstalled.. compatible.... compatible compatible--------------------------------------------------compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... cpu_adam cpu_adam...............[YES] ..............................[YES]...... [YES][OKAY][YES]...... ............[OKAY] [OKAY][OKAY] fused_adam .............fused_adamfused_adam [NO].............fused_adam............. .......[NO].............[NO] [OKAY][NO]....... ....... ....... [OKAY] [OKAY] fused_lamb[OKAY] ............. fused_lamb[NO]fused_lamb fused_lamb .......................... ....... .............[NO][NO] [OKAY] [NO] .............. .......[OKAY][OKAY] [OKAY] sparse_attn ............ [NO] sparse_attnsparse_attn....... sparse_attn ............ ............[OKAY] ............ [NO][NO][NO] transformer..................... ............[OKAY][OKAY][OKAY] [NO] .......transformer transformer [OKAY] ............transformer ............ [NO][NO]............ stochastic_transformer....... ....... [NO] [OKAY] [OKAY] ........ [NO][OKAY] stochastic_transformer.......stochastic_transformer [OKAY]stochastic_transformer . . [NO][NO] . ....... ....... [NO] [OKAY][OKAY] ....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version DeepSpeed general environment info:............... 11.1 nvcc version ..................... 11.2torch install path deepspeed install path............... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ................... 0.4.2+bc17042, bc17042, big-science torch versiondeepspeed wheel compiled w. .......................... 1.8.1torch 1.8, cuda 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference .. transformer_inference[NO] ......... [OKAY][NO] ....... [OKAY] utils .................. [YES] ...... utils[OKAY] .................. [YES] quantizer...... ..............[OKAY] [NO] ....... [OKAY]quantizer .............. [NO] --------------------------------------------------....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1torch cuda version ...............torch cuda version 11.1............... nvcc version11.1 .....................nvcc version 11.2..................... deepspeed install path11.2 ........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ................... deepspeed info 0.4.2+bc17042, bc17042, big-science................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................................... .................. .................. [OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop name op nameop name ................ ................................................installed installed installedinstalled .... .. ..compatible compatiblecompatible compatible-------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam cpu_adam...............cpu_adam............... ...............[YES]...............[YES] [YES][YES] ........................ [OKAY][OKAY][OKAY] [OKAY] fused_adam fused_adam.............fused_adamfused_adam [NO] ....................................... ....... [NO] [NO][NO] [OKAY]..................... [OKAY][OKAY][OKAY]fused_lamb .............fused_lamb fused_lambfused_lamb [NO] .............................................. [NO][OKAY] [NO][NO]....... .......[OKAY]....... [OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn sparse_attntransformer............sparse_attn ........................[NO]............ ....... [NO][NO][NO] [OKAY].............. ....... [OKAY] [OKAY] transformer[OKAY] ............transformerstochastic_transformer transformer [NO]............ ....................[NO] [NO][OKAY].......[NO] ....... [OKAY] ....... [OKAY] [OKAY]stochastic_transformer stochastic_transformer .stochastic_transformer .[NO] . .......[NO] [NO][OKAY]....... .......[OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizerasync_io ............................. [NO][NO] .............. [OKAY][NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1 nvcc version nvcc version..................... .....................11.2 11.2 deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info deepspeed info................... ................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io ............... ...............async_io[NO] [NO] ............................. [NO][NO][NO] ....... [NO] transformer_inferencetransformer_inference ..transformer_inference .. [NO] .. [NO] ....... [NO] ....... [OKAY] ....... [OKAY] [OKAY] utilsutils utils .................. .................. .................. [YES] [YES] [YES] ...... ...... ...... [OKAY] [OKAY] [OKAY] quantizerquantizer quantizer............................ ..............[NO][NO] [NO].............. [OKAY][OKAY]....... [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch install path .................... ...............1.8.1 torch cuda version ............... 11.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version .....................torch version 11.2.................... deepspeed install path1.8.1 ........... torch cuda version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']............... deepspeed info11.1 ...................nvcc version .....................0.4.2+bc17042, bc17042, big-science 11.2 deepspeed wheel compiled w. deepspeed install path...... ...........torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 > setting codecarbon ... DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] fused_adamfused_adam .......................... [NO][NO] .............. [OKAY][OKAY] fused_lamb .............fused_lamb [NO]............. .......[NO] [OKAY]....... [OKAY] sparse_attn ............sparse_attn [NO]............ .......[NO] [OKAY] ....... [OKAY] transformertransformer ........................ [NO][NO] .............. [OKAY][OKAY] stochastic_transformer stochastic_transformer . [NO]. .......[NO] [OKAY]....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO]ninja ....... ..................[OKAY] [OKAY] -------------------------------------------------- op name ................ installed .. compatiblesparse_attn --------------------------------------------------............ [NO] ....... [OKAY] transformercpu_adam ............ ...............[NO] [YES]....... ......[OKAY] [OKAY] stochastic_transformer . [NO] ....... fused_adam[OKAY] ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. transformer_inference[NO] ......... [NO] [OKAY]....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. [NO] .......quantizer [OKAY].............. [NO] .......-------------------------------------------------- [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch install path ...............torch version .................... 1.8.1 torch cuda version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ............... 11.1 torch version nvcc version.................... .....................1.8.1 11.2 torch cuda versiondeepspeed install path .......................... 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']nvcc version .....................deepspeed info 11.2................... deepspeed install path 0.4.2+bc17042, bc17042, big-science........... deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...... deepspeed infotorch 1.8, cuda 11.1 ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** ninjaninjaninjaninja .................................... .................. ..................[OKAY][OKAY][OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- op name --------------------------------------------------op nameop name................ ................ op name installedinstalled................ .. .................. installed compatiblecompatible installed .. ---------------------------------------------------------------------------------------------------- .. compatible compatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam .............................. [YES][YES] cpu_adamcpu_adam...... .....................[OKAY]............... [YES][OKAY][YES] ............ [OKAY][OKAY] fused_adam ............. [NO] .......fused_adam [OKAY]............. fused_adam fused_adam [NO] ..........................fused_lamb....... [OKAY][NO].............[NO] [NO]....... .......fused_lamb....... [OKAY][OKAY][OKAY]............. [NO] fused_lamb.......fused_lamb [OKAY].......................... [NO][NO] .............. sparse_attn[OKAY][OKAY] ............ [NO] ....... sparse_attn[OKAY] ............ [NO] transformer....... ............[OKAY]sparse_attn sparse_attn [NO]............ transformer................... [NO][NO]............[OKAY] ..............[NO] [OKAY]stochastic_transformer[OKAY] ....... [OKAY]. transformer transformer [NO] stochastic_transformer ............ .............[NO]....... [NO][NO][OKAY] ..................... [OKAY][OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- > initializing torch distributed ... > setting tensorboard ... DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... 11.2..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 8 > setting random seeds to 43 ... [2021-09-27 17:43:40,715] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2761 and data parallel seed: 43 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' >>> done with dataset index builder. Compilation time: 0.304 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 22.376 seconds time to initialize megatron (seconds): 67.410 [after megatron is initialized] datetime: 2021-09-27 17:44:03 building GPT model ... [2021-09-27 17:44:03,479] [INFO] [utils.py:680:see_memory_usage] Before Building Model /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved warnings.warn( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved warnings.warn( [2021-09-27 17:44:03,481] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-09-27 17:44:03,481] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 37.48 GB, percent = 20.0% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=0, data=4, model=0): 16, ProcessCoord(pipe=0, data=4, model=1): 17, ProcessCoord(pipe=0, data=4, model=2): 18, ProcessCoord(pipe=0, data=4, model=3): 19, ProcessCoord(pipe=0, data=5, model=0): 20, ProcessCoord(pipe=0, data=5, model=1): 21, ProcessCoord(pipe=0, data=5, model=2): 22, ProcessCoord(pipe=0, data=5, model=3): 23, ProcessCoord(pipe=0, data=6, model=0): 24, ProcessCoord(pipe=0, data=6, model=1): 25, ProcessCoord(pipe=0, data=6, model=2): 26, ProcessCoord(pipe=0, data=6, model=3): 27, ProcessCoord(pipe=0, data=7, model=0): 28, ProcessCoord(pipe=0, data=7, model=1): 29, ProcessCoord(pipe=0, data=7, model=2): 30, ProcessCoord(pipe=0, data=7, model=3): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=0, model=1): 33, ProcessCoord(pipe=1, data=0, model=2): 34, ProcessCoord(pipe=1, data=0, model=3): 35, ProcessCoord(pipe=1, data=1, model=0): 36, ProcessCoord(pipe=1, data=1, model=1): 37, ProcessCoord(pipe=1, data=1, model=2): 38, ProcessCoord(pipe=1, data=1, model=3): 39, ProcessCoord(pipe=1, data=2, model=0): 40, ProcessCoord(pipe=1, data=2, model=1): 41, ProcessCoord(pipe=1, data=2, model=2): 42, ProcessCoord(pipe=1, data=2, model=3): 43, ProcessCoord(pipe=1, data=3, model=0): 44, ProcessCoord(pipe=1, data=3, model=1): 45, ProcessCoord(pipe=1, data=3, model=2): 46, ProcessCoord(pipe=1, data=3, model=3): 47, ProcessCoord(pipe=1, data=4, model=0): 48, ProcessCoord(pipe=1, data=4, model=1): 49, ProcessCoord(pipe=1, data=4, model=2): 50, ProcessCoord(pipe=1, data=4, model=3): 51, ProcessCoord(pipe=1, data=5, model=0): 52, ProcessCoord(pipe=1, data=5, model=1): 53, ProcessCoord(pipe=1, data=5, model=2): 54, ProcessCoord(pipe=1, data=5, model=3): 55, ProcessCoord(pipe=1, data=6, model=0): 56, ProcessCoord(pipe=1, data=6, model=1): 57, ProcessCoord(pipe=1, data=6, model=2): 58, ProcessCoord(pipe=1, data=6, model=3): 59, ProcessCoord(pipe=1, data=7, model=0): 60, ProcessCoord(pipe=1, data=7, model=1): 61, ProcessCoord(pipe=1, data=7, model=2): 62, ProcessCoord(pipe=1, data=7, model=3): 63, ProcessCoord(pipe=2, data=0, model=0): 64, ProcessCoord(pipe=2, data=0, model=1): 65, ProcessCoord(pipe=2, data=0, model=2): 66, ProcessCoord(pipe=2, data=0, model=3): 67, ProcessCoord(pipe=2, data=1, model=0): 68, ProcessCoord(pipe=2, data=1, model=1): 69, ProcessCoord(pipe=2, data=1, model=2): 70, ProcessCoord(pipe=2, data=1, model=3): 71, ProcessCoord(pipe=2, data=2, model=0): 72, ProcessCoord(pipe=2, data=2, model=1): 73, ProcessCoord(pipe=2, data=2, model=2): 74, ProcessCoord(pipe=2, data=2, model=3): 75, ProcessCoord(pipe=2, data=3, model=0): 76, ProcessCoord(pipe=2, data=3, model=1): 77, ProcessCoord(pipe=2, data=3, model=2): 78, ProcessCoord(pipe=2, data=3, model=3): 79, ProcessCoord(pipe=2, data=4, model=0): 80, ProcessCoord(pipe=2, data=4, model=1): 81, ProcessCoord(pipe=2, data=4, model=2): 82, ProcessCoord(pipe=2, data=4, model=3): 83, ProcessCoord(pipe=2, data=5, model=0): 84, ProcessCoord(pipe=2, data=5, model=1): 85, ProcessCoord(pipe=2, data=5, model=2): 86, ProcessCoord(pipe=2, data=5, model=3): 87, ProcessCoord(pipe=2, data=6, model=0): 88, ProcessCoord(pipe=2, data=6, model=1): 89, ProcessCoord(pipe=2, data=6, model=2): 90, ProcessCoord(pipe=2, data=6, model=3): 91, ProcessCoord(pipe=2, data=7, model=0): 92, ProcessCoord(pipe=2, data=7, model=1): 93, ProcessCoord(pipe=2, data=7, model=2): 94, ProcessCoord(pipe=2, data=7, model=3): 95, ProcessCoord(pipe=3, data=0, model=0): 96, ProcessCoord(pipe=3, data=0, model=1): 97, ProcessCoord(pipe=3, data=0, model=2): 98, ProcessCoord(pipe=3, data=0, model=3): 99, ProcessCoord(pipe=3, data=1, model=0): 100, ProcessCoord(pipe=3, data=1, model=1): 101, ProcessCoord(pipe=3, data=1, model=2): 102, ProcessCoord(pipe=3, data=1, model=3): 103, ProcessCoord(pipe=3, data=2, model=0): 104, ProcessCoord(pipe=3, data=2, model=1): 105, ProcessCoord(pipe=3, data=2, model=2): 106, ProcessCoord(pipe=3, data=2, model=3): 107, ProcessCoord(pipe=3, data=3, model=0): 108, ProcessCoord(pipe=3, data=3, model=1): 109, ProcessCoord(pipe=3, data=3, model=2): 110, ProcessCoord(pipe=3, data=3, model=3): 111, ProcessCoord(pipe=3, data=4, model=0): 112, ProcessCoord(pipe=3, data=4, model=1): 113, ProcessCoord(pipe=3, data=4, model=2): 114, ProcessCoord(pipe=3, data=4, model=3): 115, ProcessCoord(pipe=3, data=5, model=0): 116, ProcessCoord(pipe=3, data=5, model=1): 117, ProcessCoord(pipe=3, data=5, model=2): 118, ProcessCoord(pipe=3, data=5, model=3): 119, ProcessCoord(pipe=3, data=6, model=0): 120, ProcessCoord(pipe=3, data=6, model=1): 121, ProcessCoord(pipe=3, data=6, model=2): 122, ProcessCoord(pipe=3, data=6, model=3): 123, ProcessCoord(pipe=3, data=7, model=0): 124, ProcessCoord(pipe=3, data=7, model=1): 125, ProcessCoord(pipe=3, data=7, model=2): 126, ProcessCoord(pipe=3, data=7, model=3): 127, ProcessCoord(pipe=4, data=0, model=0): 128, ProcessCoord(pipe=4, data=0, model=1): 129, ProcessCoord(pipe=4, data=0, model=2): 130, ProcessCoord(pipe=4, data=0, model=3): 131, ProcessCoord(pipe=4, data=1, model=0): 132, ProcessCoord(pipe=4, data=1, model=1): 133, ProcessCoord(pipe=4, data=1, model=2): 134, ProcessCoord(pipe=4, data=1, model=3): 135, ProcessCoord(pipe=4, data=2, model=0): 136, ProcessCoord(pipe=4, data=2, model=1): 137, ProcessCoord(pipe=4, data=2, model=2): 138, ProcessCoord(pipe=4, data=2, model=3): 139, ProcessCoord(pipe=4, data=3, model=0): 140, ProcessCoord(pipe=4, data=3, model=1): 141, ProcessCoord(pipe=4, data=3, model=2): 142, ProcessCoord(pipe=4, data=3, model=3): 143, ProcessCoord(pipe=4, data=4, model=0): 144, ProcessCoord(pipe=4, data=4, model=1): 145, ProcessCoord(pipe=4, data=4, model=2): 146, ProcessCoord(pipe=4, data=4, model=3): 147, ProcessCoord(pipe=4, data=5, model=0): 148, ProcessCoord(pipe=4, data=5, model=1): 149, ProcessCoord(pipe=4, data=5, model=2): 150, ProcessCoord(pipe=4, data=5, model=3): 151, ProcessCoord(pipe=4, data=6, model=0): 152, ProcessCoord(pipe=4, data=6, model=1): 153, ProcessCoord(pipe=4, data=6, model=2): 154, ProcessCoord(pipe=4, data=6, model=3): 155, ProcessCoord(pipe=4, data=7, model=0): 156, ProcessCoord(pipe=4, data=7, model=1): 157, ProcessCoord(pipe=4, data=7, model=2): 158, ProcessCoord(pipe=4, data=7, model=3): 159, ProcessCoord(pipe=5, data=0, model=0): 160, ProcessCoord(pipe=5, data=0, model=1): 161, ProcessCoord(pipe=5, data=0, model=2): 162, ProcessCoord(pipe=5, data=0, model=3): 163, ProcessCoord(pipe=5, data=1, model=0): 164, ProcessCoord(pipe=5, data=1, model=1): 165, ProcessCoord(pipe=5, data=1, model=2): 166, ProcessCoord(pipe=5, data=1, model=3): 167, ProcessCoord(pipe=5, data=2, model=0): 168, ProcessCoord(pipe=5, data=2, model=1): 169, ProcessCoord(pipe=5, data=2, model=2): 170, ProcessCoord(pipe=5, data=2, model=3): 171, ProcessCoord(pipe=5, data=3, model=0): 172, ProcessCoord(pipe=5, data=3, model=1): 173, ProcessCoord(pipe=5, data=3, model=2): 174, ProcessCoord(pipe=5, data=3, model=3): 175, ProcessCoord(pipe=5, data=4, model=0): 176, ProcessCoord(pipe=5, data=4, model=1): 177, ProcessCoord(pipe=5, data=4, model=2): 178, ProcessCoord(pipe=5, data=4, model=3): 179, ProcessCoord(pipe=5, data=5, model=0): 180, ProcessCoord(pipe=5, data=5, model=1): 181, ProcessCoord(pipe=5, data=5, model=2): 182, ProcessCoord(pipe=5, data=5, model=3): 183, ProcessCoord(pipe=5, data=6, model=0): 184, ProcessCoord(pipe=5, data=6, model=1): 185, ProcessCoord(pipe=5, data=6, model=2): 186, ProcessCoord(pipe=5, data=6, model=3): 187, ProcessCoord(pipe=5, data=7, model=0): 188, ProcessCoord(pipe=5, data=7, model=1): 189, ProcessCoord(pipe=5, data=7, model=2): 190, ProcessCoord(pipe=5, data=7, model=3): 191, ProcessCoord(pipe=6, data=0, model=0): 192, ProcessCoord(pipe=6, data=0, model=1): 193, ProcessCoord(pipe=6, data=0, model=2): 194, ProcessCoord(pipe=6, data=0, model=3): 195, ProcessCoord(pipe=6, data=1, model=0): 196, ProcessCoord(pipe=6, data=1, model=1): 197, ProcessCoord(pipe=6, data=1, model=2): 198, ProcessCoord(pipe=6, data=1, model=3): 199, ProcessCoord(pipe=6, data=2, model=0): 200, ProcessCoord(pipe=6, data=2, model=1): 201, ProcessCoord(pipe=6, data=2, model=2): 202, ProcessCoord(pipe=6, data=2, model=3): 203, ProcessCoord(pipe=6, data=3, model=0): 204, ProcessCoord(pipe=6, data=3, model=1): 205, ProcessCoord(pipe=6, data=3, model=2): 206, ProcessCoord(pipe=6, data=3, model=3): 207, ProcessCoord(pipe=6, data=4, model=0): 208, ProcessCoord(pipe=6, data=4, model=1): 209, ProcessCoord(pipe=6, data=4, model=2): 210, ProcessCoord(pipe=6, data=4, model=3): 211, ProcessCoord(pipe=6, data=5, model=0): 212, ProcessCoord(pipe=6, data=5, model=1): 213, ProcessCoord(pipe=6, data=5, model=2): 214, ProcessCoord(pipe=6, data=5, model=3): 215, ProcessCoord(pipe=6, data=6, model=0): 216, ProcessCoord(pipe=6, data=6, model=1): 217, ProcessCoord(pipe=6, data=6, model=2): 218, ProcessCoord(pipe=6, data=6, model=3): 219, ProcessCoord(pipe=6, data=7, model=0): 220, ProcessCoord(pipe=6, data=7, model=1): 221, ProcessCoord(pipe=6, data=7, model=2): 222, ProcessCoord(pipe=6, data=7, model=3): 223, ProcessCoord(pipe=7, data=0, model=0): 224, ProcessCoord(pipe=7, data=0, model=1): 225, ProcessCoord(pipe=7, data=0, model=2): 226, ProcessCoord(pipe=7, data=0, model=3): 227, ProcessCoord(pipe=7, data=1, model=0): 228, ProcessCoord(pipe=7, data=1, model=1): 229, ProcessCoord(pipe=7, data=1, model=2): 230, ProcessCoord(pipe=7, data=1, model=3): 231, ProcessCoord(pipe=7, data=2, model=0): 232, ProcessCoord(pipe=7, data=2, model=1): 233, ProcessCoord(pipe=7, data=2, model=2): 234, ProcessCoord(pipe=7, data=2, model=3): 235, ProcessCoord(pipe=7, data=3, model=0): 236, ProcessCoord(pipe=7, data=3, model=1): 237, ProcessCoord(pipe=7, data=3, model=2): 238, ProcessCoord(pipe=7, data=3, model=3): 239, ProcessCoord(pipe=7, data=4, model=0): 240, ProcessCoord(pipe=7, data=4, model=1): 241, ProcessCoord(pipe=7, data=4, model=2): 242, ProcessCoord(pipe=7, data=4, model=3): 243, ProcessCoord(pipe=7, data=5, model=0): 244, ProcessCoord(pipe=7, data=5, model=1): 245, ProcessCoord(pipe=7, data=5, model=2): 246, ProcessCoord(pipe=7, data=5, model=3): 247, ProcessCoord(pipe=7, data=6, model=0): 248, ProcessCoord(pipe=7, data=6, model=1): 249, ProcessCoord(pipe=7, data=6, model=2): 250, ProcessCoord(pipe=7, data=6, model=3): 251, ProcessCoord(pipe=7, data=7, model=0): 252, ProcessCoord(pipe=7, data=7, model=1): 253, ProcessCoord(pipe=7, data=7, model=2): 254, ProcessCoord(pipe=7, data=7, model=3): 255} [2021-09-27 17:44:04,887] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=7 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe stage=1 layers=4 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe stage=2 layers=4 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=3 layers=4 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe stage=4 layers=4 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe stage=5 layers=4 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe stage=6 layers=4 27: ParallelTransformerLayerPipe 28: ParallelTransformerLayerPipe 29: ParallelTransformerLayerPipe 30: ParallelTransformerLayerPipe stage=7 layers=8 31: ParallelTransformerLayerPipe 32: ParallelTransformerLayerPipe 33: ParallelTransformerLayerPipe 34: ParallelTransformerLayerPipe 35: 36: MixedFusedLayerNorm 37: EmbeddingPipe 38: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 1986498560 [2021-09-27 17:44:06,136] [INFO] [utils.py:680:see_memory_usage] After Building Model [2021-09-27 17:44:06,136] [INFO] [utils.py:681:see_memory_usage] MA 3.77 GB Max_MA 3.79 GB CA 3.79 GB Max_CA 4 GB [2021-09-27 17:44:06,137] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 37.66 GB, percent = 20.1% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 1986465792 setting training iterations to 159576 > learning rate decay style: cosine DeepSpeed is enabled. [2021-09-27 17:44:06,204] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+bc17042, git-hash=bc17042, git-branch=big-science [2021-09-27 17:44:06,292] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-09-27 17:44:06,292] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-09-27 17:44:06,292] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer [2021-09-27 17:44:06,292] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-09-27 17:44:06,292] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-09-27 17:44:06,292] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-09-27 17:44:06,292] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-09-27 17:44:06,292] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-09-27 17:44:06,292] [INFO] [stage2.py:108:__init__] CPU Offload: False [2021-09-27 17:44:06,292] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False [2021-09-27 17:44:11,004] [INFO] [stage2.py:419:__init__] optimizer state initialized [2021-09-27 17:44:11,004] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-09-27 17:44:11,004] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-09-27 17:44:11,005] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-09-27 17:44:11,005] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-09-27 17:44:11,005] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] amp_enabled .................. False [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] amp_params ................... False [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] disable_allgather ............ False [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] dump_state ................... False [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-09-27 17:44:11,005] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] global_rank .................. 0 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] gradient_accumulation_steps .. 256 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] gradient_clipping ............ 1.0 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4096 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] optimizer_name ............... None [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] optimizer_params ............. None [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] pld_enabled .................. False [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] pld_params ................... False [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] scheduler_name ............... None [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] scheduler_params ............. None [2021-09-27 17:44:11,006] [INFO] [config.py:904:print] sparse_attention ............. None [2021-09-27 17:44:11,007] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-09-27 17:44:11,007] [INFO] [config.py:904:print] steps_per_print .............. 2000 [2021-09-27 17:44:11,007] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-09-27 17:44:11,007] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-09-27 17:44:11,007] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-09-27 17:44:11,007] [INFO] [config.py:904:print] train_batch_size ............. 2048 [2021-09-27 17:44:11,007] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 1 [2021-09-27 17:44:11,007] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-09-27 17:44:11,007] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-09-27 17:44:11,007] [INFO] [config.py:904:print] world_size ................... 8 [2021-09-27 17:44:11,007] [INFO] [config.py:904:print] zero_allow_untested_optimizer False [2021-09-27 17:44:11,007] [INFO] [config.py:904:print] zero_config .................. { "stage": 1, "contiguous_gradients": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-09-27 17:44:11,007] [INFO] [config.py:904:print] zero_enabled ................. True [2021-09-27 17:44:11,007] [INFO] [config.py:904:print] zero_optimization_stage ...... 1 [2021-09-27 17:44:11,007] [INFO] [config.py:906:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 2.048000e+03, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-09-27 17:44:11,007] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=256 micro_batch_size=1 [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=131 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=129 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=128 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=130 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=224 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=225 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=227 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=226 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=65 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=66 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=64 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=67 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=193 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=194 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=195 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=192 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=32 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=35 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=34 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=33 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=99 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=98 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=97 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=161 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=160 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=162 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=163 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-27 17:44:11,311] [INFO] [engine.py:134:__init__] RANK=96 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) > using checkpoint value 6e-05 for learning rate > using checkpoint value 6e-06 for minimum learning rate > using checkpoint value 216320 for warmup iterations > using checkpoint value 126953125 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 8 ZeRO state_dicts for rank 196 successfully loaded 8 ZeRO state_dicts for rank 207 successfully loaded 8 ZeRO state_dicts for rank 96 successfully loaded 8 ZeRO state_dicts for rank 192 successfully loaded 8 ZeRO state_dicts for rank 195 successfully loaded 8 ZeRO state_dicts for rank 212 successfully loaded 8 ZeRO state_dicts for rank 154 successfully loaded 8 ZeRO state_dicts for rank 148 successfully loaded 8 ZeRO state_dicts for rank 198 successfully loaded 8 ZeRO state_dicts for rank 112 successfully loaded 8 ZeRO state_dicts for rank 104 successfully loaded 8 ZeRO state_dicts for rank 42 successfully loaded 8 ZeRO state_dicts for rank 199 successfully loaded 8 ZeRO state_dicts for rank 120 successfully loaded 8 ZeRO state_dicts for rank 205 successfully loaded 8 ZeRO state_dicts for rank 193 successfully loaded 8 ZeRO state_dicts for rank 116 successfully loaded 8 ZeRO state_dicts for rank 158 successfully loaded 8 ZeRO state_dicts for rank 150 successfully loaded 8 ZeRO state_dicts for rank 100 successfully loaded 8 ZeRO state_dicts for rank 166 successfully loaded 8 ZeRO state_dicts for rank 62 successfully loaded 8 ZeRO state_dicts for rank 134 successfully loaded 8 ZeRO state_dicts for rank 204 successfully loaded 8 ZeRO state_dicts for rank 67 successfully loaded 8 ZeRO state_dicts for rank 136 successfully loaded 8 ZeRO state_dicts for rank 145 successfully loaded 8 ZeRO state_dicts for rank 182 successfully loaded 8 ZeRO state_dicts for rank 133 successfully loaded 8 ZeRO state_dicts for rank 65 successfully loaded 8 ZeRO state_dicts for rank 208 successfully loaded 8 ZeRO state_dicts for rank 124 successfully loaded 8 ZeRO state_dicts for rank 130 successfully loaded 8 ZeRO state_dicts for rank 141 successfully loaded 8 ZeRO state_dicts for rank 132 successfully loaded 8 ZeRO state_dicts for rank 206 successfully loaded 8 ZeRO state_dicts for rank 157 successfully loaded 8 ZeRO state_dicts for rank 115 successfully loaded 8 ZeRO state_dicts for rank 200 successfully loaded 8 ZeRO state_dicts for rank 152 successfully loaded 8 ZeRO state_dicts for rank 89 successfully loaded 8 ZeRO state_dicts for rank 40 successfully loaded 8 ZeRO state_dicts for rank 140 successfully loaded 8 ZeRO state_dicts for rank 75 loading 8 zero partition checkpoints for rank 196 successfully loaded 8 ZeRO state_dicts for rank 122 successfully loaded 8 ZeRO state_dicts for rank 153 successfully loaded 8 ZeRO state_dicts for rank 32 successfully loaded 8 ZeRO state_dicts for rank 216 successfully loaded 8 ZeRO state_dicts for rank 57 successfully loaded 8 ZeRO state_dicts for rank 165 successfully loaded 8 ZeRO state_dicts for rank 60 successfully loaded 8 ZeRO state_dicts for rank 63 successfully loaded 8 ZeRO state_dicts for rank 83 successfully loaded 8 ZeRO state_dicts for rank 99 successfully loaded 8 ZeRO state_dicts for rank 108 successfully loaded 8 ZeRO state_dicts for rank 77 successfully loaded 8 ZeRO state_dicts for rank 190 successfully loaded 8 ZeRO state_dicts for rank 146 successfully loaded 8 ZeRO state_dicts for rank 36 successfully loaded 8 ZeRO state_dicts for rank 114 successfully loaded 8 ZeRO state_dicts for rank 129 successfully loaded 8 ZeRO state_dicts for rank 54 successfully loaded 8 ZeRO state_dicts for rank 98 successfully loaded 8 ZeRO state_dicts for rank 220 successfully loaded 8 ZeRO state_dicts for rank 93 successfully loaded 8 ZeRO state_dicts for rank 144 successfully loaded 8 ZeRO state_dicts for rank 64 successfully loaded 8 ZeRO state_dicts for rank 76 successfully loaded 8 ZeRO state_dicts for rank 58 successfully loaded 8 ZeRO state_dicts for rank 72 successfully loaded 8 ZeRO state_dicts for rank 155 successfully loaded 8 ZeRO state_dicts for rank 103 successfully loaded 8 ZeRO state_dicts for rank 80 successfully loaded 8 ZeRO state_dicts for rank 34 successfully loaded 8 ZeRO state_dicts for rank 149 successfully loaded 8 ZeRO state_dicts for rank 87 successfully loaded 8 ZeRO state_dicts for rank 52 successfully loaded 8 ZeRO state_dicts for rank 41 successfully loaded 8 ZeRO state_dicts for rank 84 successfully loaded 8 ZeRO state_dicts for rank 37 successfully loaded 8 ZeRO state_dicts for rank 38 successfully loaded 8 ZeRO state_dicts for rank 107 successfully loaded 8 ZeRO state_dicts for rank 48 successfully loaded 8 ZeRO state_dicts for rank 44 successfully loaded 8 ZeRO state_dicts for rank 53 loading 8 zero partition checkpoints for rank 96 successfully loaded 8 ZeRO state_dicts for rank 79 successfully loaded 8 ZeRO state_dicts for rank 164 successfully loaded 8 ZeRO state_dicts for rank 46 successfully loaded 8 ZeRO state_dicts for rank 73 successfully loaded 8 ZeRO state_dicts for rank 91 successfully loaded 8 ZeRO state_dicts for rank 106 successfully loaded 8 ZeRO state_dicts for rank 71 loading 8 zero partition checkpoints for rank 207 successfully loaded 8 ZeRO state_dicts for rank 138 successfully loaded 8 ZeRO state_dicts for rank 33 successfully loaded 8 ZeRO state_dicts for rank 156 successfully loaded 8 ZeRO state_dicts for rank 201 successfully loaded 8 ZeRO state_dicts for rank 61 successfully loaded 8 ZeRO state_dicts for rank 178 successfully loaded 8 ZeRO state_dicts for rank 39 successfully loaded 8 ZeRO state_dicts for rank 111 successfully loaded 8 ZeRO state_dicts for rank 215 successfully loaded 8 ZeRO state_dicts for rank 191 successfully loaded 8 ZeRO state_dicts for rank 147 successfully loaded 8 ZeRO state_dicts for rank 167 successfully loaded 8 ZeRO state_dicts for rank 170 successfully loaded 8 ZeRO state_dicts for rank 95 successfully loaded 8 ZeRO state_dicts for rank 142 successfully loaded 8 ZeRO state_dicts for rank 151 successfully loaded 8 ZeRO state_dicts for rank 135 successfully loaded 8 ZeRO state_dicts for rank 118 successfully loaded 8 ZeRO state_dicts for rank 97 successfully loaded 8 ZeRO state_dicts for rank 159 successfully loaded 8 ZeRO state_dicts for rank 174 successfully loaded 8 ZeRO state_dicts for rank 219 successfully loaded 8 ZeRO state_dicts for rank 211 successfully loaded 8 ZeRO state_dicts for rank 180 successfully loaded 8 ZeRO state_dicts for rank 143 successfully loaded 8 ZeRO state_dicts for rank 43 successfully loaded 8 ZeRO state_dicts for rank 171 successfully loaded 8 ZeRO state_dicts for rank 55 successfully loaded 8 ZeRO state_dicts for rank 59 successfully loaded 8 ZeRO state_dicts for rank 203 successfully loaded 8 ZeRO state_dicts for rank 45 successfully loaded 8 ZeRO state_dicts for rank 210 successfully loaded 8 ZeRO state_dicts for rank 50 successfully loaded 8 ZeRO state_dicts for rank 113 successfully loaded 8 ZeRO state_dicts for rank 68 successfully loaded 8 ZeRO state_dicts for rank 128 successfully loaded 8 ZeRO state_dicts for rank 187 successfully loaded 8 ZeRO state_dicts for rank 186 successfully loaded 8 ZeRO state_dicts for rank 102 successfully loaded 8 ZeRO state_dicts for rank 109 successfully loaded 8 ZeRO state_dicts for rank 56 successfully loaded 8 ZeRO state_dicts for rank 137 successfully loaded 8 ZeRO state_dicts for rank 81 successfully loaded 8 ZeRO state_dicts for rank 169 successfully loaded 8 ZeRO state_dicts for rank 202 successfully loaded 8 ZeRO state_dicts for rank 10 successfully loaded 8 ZeRO state_dicts for rank 110 loading 8 zero partition checkpoints for rank 195 successfully loaded 8 ZeRO state_dicts for rank 197 successfully loaded 8 ZeRO state_dicts for rank 119 successfully loaded 8 ZeRO state_dicts for rank 105 successfully loaded 8 ZeRO state_dicts for rank 88 successfully loaded 8 ZeRO state_dicts for rank 92 successfully loaded 8 ZeRO state_dicts for rank 214 successfully loaded 8 ZeRO state_dicts for rank 223 successfully loaded 8 ZeRO state_dicts for rank 126 successfully loaded 8 ZeRO state_dicts for rank 162 loading 8 zero partition checkpoints for rank 192 successfully loaded 8 ZeRO state_dicts for rank 173 successfully loaded 8 ZeRO state_dicts for rank 125 successfully loaded 8 ZeRO state_dicts for rank 90 successfully loaded 8 ZeRO state_dicts for rank 121 successfully loaded 8 ZeRO state_dicts for rank 123 successfully loaded 8 ZeRO state_dicts for rank 163 successfully loaded 8 ZeRO state_dicts for rank 127 successfully loaded 8 ZeRO state_dicts for rank 51 successfully loaded 8 ZeRO state_dicts for rank 78 successfully loaded 8 ZeRO state_dicts for rank 213 successfully loaded 8 ZeRO state_dicts for rank 181 successfully loaded 8 ZeRO state_dicts for rank 194 successfully loaded 8 ZeRO state_dicts for rank 218 successfully loaded 8 ZeRO state_dicts for rank 35 successfully loaded 8 ZeRO state_dicts for rank 22 successfully loaded 8 ZeRO state_dicts for rank 188 successfully loaded 8 ZeRO state_dicts for rank 139 successfully loaded 8 ZeRO state_dicts for rank 47 successfully loaded 8 ZeRO state_dicts for rank 175 successfully loaded 8 ZeRO state_dicts for rank 168 successfully loaded 8 ZeRO state_dicts for rank 184 successfully loaded 8 ZeRO state_dicts for rank 69 successfully loaded 8 ZeRO state_dicts for rank 85 loading 8 zero partition checkpoints for rank 154 successfully loaded 8 ZeRO state_dicts for rank 66 successfully loaded 8 ZeRO state_dicts for rank 117 successfully loaded 8 ZeRO state_dicts for rank 161 successfully loaded 8 ZeRO state_dicts for rank 49 successfully loaded 8 ZeRO state_dicts for rank 86 successfully loaded 8 ZeRO state_dicts for rank 101 successfully loaded 8 ZeRO state_dicts for rank 222 successfully loaded 8 ZeRO state_dicts for rank 70 successfully loaded 8 ZeRO state_dicts for rank 30 successfully loaded 8 ZeRO state_dicts for rank 131 successfully loaded 8 ZeRO state_dicts for rank 183 loading 8 zero partition checkpoints for rank 112 successfully loaded 8 ZeRO state_dicts for rank 94 successfully loaded 8 ZeRO state_dicts for rank 217 successfully loaded 8 ZeRO state_dicts for rank 82 successfully loaded 8 ZeRO state_dicts for rank 8 successfully loaded 8 ZeRO state_dicts for rank 160 successfully loaded 8 ZeRO state_dicts for rank 252 loading 8 zero partition checkpoints for rank 205 successfully loaded 8 ZeRO state_dicts for rank 172 successfully loaded 8 ZeRO state_dicts for rank 14 loading 8 zero partition checkpoints for rank 42 loading 8 zero partition checkpoints for rank 104 loading 8 zero partition checkpoints for rank 193 successfully loaded 8 ZeRO state_dicts for rank 189 successfully loaded 8 ZeRO state_dicts for rank 232 successfully loaded 8 ZeRO state_dicts for rank 177 loading 8 zero partition checkpoints for rank 120 successfully loaded 8 ZeRO state_dicts for rank 228 successfully loaded 8 ZeRO state_dicts for rank 185 successfully loaded 8 ZeRO state_dicts for rank 209 successfully loaded 8 ZeRO state_dicts for rank 235 successfully loaded 8 ZeRO state_dicts for rank 244 successfully loaded 8 ZeRO state_dicts for rank 236 successfully loaded 8 ZeRO state_dicts for rank 31 loading 8 zero partition checkpoints for rank 116 successfully loaded 8 ZeRO state_dicts for rank 224 loading 8 zero partition checkpoints for rank 62 successfully loaded 8 ZeRO state_dicts for rank 74 loading 8 zero partition checkpoints for rank 166 loading 8 zero partition checkpoints for rank 134 successfully loaded 8 ZeRO state_dicts for rank 26 successfully loaded 8 ZeRO state_dicts for rank 176 loading 8 zero partition checkpoints for rank 204 successfully loaded 8 ZeRO state_dicts for rank 251 successfully loaded 8 ZeRO state_dicts for rank 15 successfully loaded 8 ZeRO state_dicts for rank 4 loading 8 zero partition checkpoints for rank 199 loading 8 zero partition checkpoints for rank 133 loading 8 zero partition checkpoints for rank 198 loading 8 zero partition checkpoints for rank 67 successfully loaded 8 ZeRO state_dicts for rank 18 successfully loaded 8 ZeRO state_dicts for rank 179 loading 8 zero partition checkpoints for rank 124 successfully loaded 8 ZeRO state_dicts for rank 247 successfully loaded 8 ZeRO state_dicts for rank 11 successfully loaded 8 ZeRO state_dicts for rank 28 loading 8 zero partition checkpoints for rank 148 successfully loaded 8 ZeRO state_dicts for rank 229 loading 8 zero partition checkpoints for rank 65 successfully loaded 8 ZeRO state_dicts for rank 7 successfully loaded 8 ZeRO state_dicts for rank 248 successfully loaded 8 ZeRO state_dicts for rank 221 loading 8 zero partition checkpoints for rank 182 loading 8 zero partition checkpoints for rank 130 successfully loaded 8 ZeRO state_dicts for rank 238 successfully loaded 8 ZeRO state_dicts for rank 12 loading 8 zero partition checkpoints for rank 145 successfully loaded 8 ZeRO state_dicts for rank 234 successfully loaded 8 ZeRO state_dicts for rank 6 loading 8 zero partition checkpoints for rank 206 successfully loaded 8 ZeRO state_dicts for rank 27 successfully loaded 8 ZeRO state_dicts for rank 250 loading 8 zero partition checkpoints for rank 157 successfully loaded 8 ZeRO state_dicts for rank 225 successfully loaded 8 ZeRO state_dicts for rank 23 loading 8 zero partition checkpoints for rank 40 successfully loaded 8 ZeRO state_dicts for rank 19 successfully loaded 8 ZeRO state_dicts for rank 3 loading 8 zero partition checkpoints for rank 89 loading 8 zero partition checkpoints for rank 141 loading 8 zero partition checkpoints for rank 122 loading 8 zero partition checkpoints for rank 75 successfully loaded 8 ZeRO state_dicts for rank 239 successfully loaded 8 ZeRO state_dicts for rank 241 successfully loaded 8 ZeRO state_dicts for rank 245 successfully loaded 8 ZeRO state_dicts for rank 243 successfully loaded 8 ZeRO state_dicts for rank 0 successfully loaded 8 ZeRO state_dicts for rank 20 successfully loaded 8 ZeRO state_dicts for rank 24 loading 8 zero partition checkpoints for rank 140 successfully loaded 8 ZeRO state_dicts for rank 231 successfully loaded 8 ZeRO state_dicts for rank 29 loading 8 zero partition checkpoints for rank 32 successfully loaded 8 ZeRO state_dicts for rank 240 successfully loaded 8 ZeRO state_dicts for rank 2 successfully loaded 8 ZeRO state_dicts for rank 16 loading 8 zero partition checkpoints for rank 132 successfully loaded 8 ZeRO state_dicts for rank 233 successfully loaded 8 ZeRO state_dicts for rank 253 successfully loaded 8 ZeRO state_dicts for rank 255 successfully loaded 8 ZeRO state_dicts for rank 242 successfully loaded 8 ZeRO state_dicts for rank 237 loading 8 zero partition checkpoints for rank 83 successfully loaded 8 ZeRO state_dicts for rank 254 loading 8 zero partition checkpoints for rank 165 loading 8 zero partition checkpoints for rank 158 successfully loaded 8 ZeRO state_dicts for rank 246 loading 8 zero partition checkpoints for rank 77 loading 8 zero partition checkpoints for rank 99 loading 8 zero partition checkpoints for rank 152 loading 8 zero partition checkpoints for rank 216 loading 8 zero partition checkpoints for rank 36 loading 8 zero partition checkpoints for rank 115 loading 8 zero partition checkpoints for rank 54 loading 8 zero partition checkpoints for rank 190 loading 8 zero partition checkpoints for rank 146 loading 8 zero partition checkpoints for rank 98 loading 8 zero partition checkpoints for rank 100 loading 8 zero partition checkpoints for rank 150 successfully loaded 8 ZeRO state_dicts for rank 13 successfully loaded 8 ZeRO state_dicts for rank 226 successfully loaded 8 ZeRO state_dicts for rank 9 loading 8 zero partition checkpoints for rank 153 loading 8 zero partition checkpoints for rank 64 successfully loaded 8 ZeRO state_dicts for rank 5 successfully loaded 8 ZeRO state_dicts for rank 249 loading 8 zero partition checkpoints for rank 155 loading 8 zero partition checkpoints for rank 72 successfully loaded 8 ZeRO state_dicts for rank 17 successfully loaded 8 ZeRO state_dicts for rank 230 loading 8 zero partition checkpoints for rank 80 loading 8 zero partition checkpoints for rank 149 loading 8 zero partition checkpoints for rank 76 successfully loaded 8 ZeRO state_dicts for rank 1 successfully loaded 8 ZeRO state_dicts for rank 227 loading 8 zero partition checkpoints for rank 144 successfully loaded 8 ZeRO state_dicts for rank 21 loading 8 zero partition checkpoints for rank 41 loading 8 zero partition checkpoints for rank 107 loading 8 zero partition checkpoints for rank 34 loading 8 zero partition checkpoints for rank 87 loading 8 zero partition checkpoints for rank 212 loading 8 zero partition checkpoints for rank 220 loading 8 zero partition checkpoints for rank 44 loading 8 zero partition checkpoints for rank 73 loading 8 zero partition checkpoints for rank 33 loading 8 zero partition checkpoints for rank 164 loading 8 zero partition checkpoints for rank 111 loading 8 zero partition checkpoints for rank 106 loading 8 zero partition checkpoints for rank 167 loading 8 zero partition checkpoints for rank 39 loading 8 zero partition checkpoints for rank 46 loading 8 zero partition checkpoints for rank 201 loading 8 zero partition checkpoints for rank 151 loading 8 zero partition checkpoints for rank 118 loading 8 zero partition checkpoints for rank 71 loading 8 zero partition checkpoints for rank 59 loading 8 zero partition checkpoints for rank 114 loading 8 zero partition checkpoints for rank 159 loading 8 zero partition checkpoints for rank 57 loading 8 zero partition checkpoints for rank 43 loading 8 zero partition checkpoints for rank 97 loading 8 zero partition checkpoints for rank 219 loading 8 zero partition checkpoints for rank 113 loading 8 zero partition checkpoints for rank 55 loading 8 zero partition checkpoints for rank 61 loading 8 zero partition checkpoints for rank 203 loading 8 zero partition checkpoints for rank 211 loading 8 zero partition checkpoints for rank 50 loading 8 zero partition checkpoints for rank 48 loading 8 zero partition checkpoints for rank 200 loading 8 zero partition checkpoints for rank 191 loading 8 zero partition checkpoints for rank 169 loading 8 zero partition checkpoints for rank 102 loading 8 zero partition checkpoints for rank 81 loading 8 zero partition checkpoints for rank 56 loading 8 zero partition checkpoints for rank 147 loading 8 zero partition checkpoints for rank 84 loading 8 zero partition checkpoints for rank 136 loading 8 zero partition checkpoints for rank 210 loading 8 zero partition checkpoints for rank 178 loading 8 zero partition checkpoints for rank 105 loading 8 zero partition checkpoints for rank 223 loading 8 zero partition checkpoints for rank 197 loading 8 zero partition checkpoints for rank 170 loading 8 zero partition checkpoints for rank 135 loading 8 zero partition checkpoints for rank 45 loading 8 zero partition checkpoints for rank 119 loading 8 zero partition checkpoints for rank 180 loading 8 zero partition checkpoints for rank 173 loading 8 zero partition checkpoints for rank 123 loading 8 zero partition checkpoints for rank 125 loading 8 zero partition checkpoints for rank 171 loading 8 zero partition checkpoints for rank 186 loading 8 zero partition checkpoints for rank 109 loading 8 zero partition checkpoints for rank 52 loading 8 zero partition checkpoints for rank 121 loading 8 zero partition checkpoints for rank 58 loading 8 zero partition checkpoints for rank 53 loading 8 zero partition checkpoints for rank 218 loading 8 zero partition checkpoints for rank 168 loading 8 zero partition checkpoints for rank 181 loading 8 zero partition checkpoints for rank 188 loading 8 zero partition checkpoints for rank 194 loading 8 zero partition checkpoints for rank 92 loading 8 zero partition checkpoints for rank 184 successfully loaded 8 ZeRO state_dicts for rank 25 loading 8 zero partition checkpoints for rank 156 loading 8 zero partition checkpoints for rank 161 loading 8 zero partition checkpoints for rank 131 loading 8 zero partition checkpoints for rank 63 loading 8 zero partition checkpoints for rank 35 loading 8 zero partition checkpoints for rank 66 loading 8 zero partition checkpoints for rank 90 loading 8 zero partition checkpoints for rank 163 loading 8 zero partition checkpoints for rank 93 loading 8 zero partition checkpoints for rank 86 loading 8 zero partition checkpoints for rank 183 loading 8 zero partition checkpoints for rank 117 loading 8 zero partition checkpoints for rank 103 loading 8 zero partition checkpoints for rank 47 loading 8 zero partition checkpoints for rank 10 loading 8 zero partition checkpoints for rank 82 loading 8 zero partition checkpoints for rank 69 loading 8 zero partition checkpoints for rank 60 loading 8 zero partition checkpoints for rank 101 loading 8 zero partition checkpoints for rank 94 loading 8 zero partition checkpoints for rank 22 loading 8 zero partition checkpoints for rank 108 loading 8 zero partition checkpoints for rank 177 loading 8 zero partition checkpoints for rank 37 loading 8 zero partition checkpoints for rank 38 loading 8 zero partition checkpoints for rank 79 loading 8 zero partition checkpoints for rank 217 loading 8 zero partition checkpoints for rank 138 loading 8 zero partition checkpoints for rank 189 loading 8 zero partition checkpoints for rank 208 loading 8 zero partition checkpoints for rank 143 loading 8 zero partition checkpoints for rank 142 loading 8 zero partition checkpoints for rank 172 loading 8 zero partition checkpoints for rank 85 loading 8 zero partition checkpoints for rank 74 loading 8 zero partition checkpoints for rank 68 loading 8 zero partition checkpoints for rank 14 loading 8 zero partition checkpoints for rank 252 loading 8 zero partition checkpoints for rank 202 loading 8 zero partition checkpoints for rank 95 loading 8 zero partition checkpoints for rank 126 loading 8 zero partition checkpoints for rank 129 loading 8 zero partition checkpoints for rank 232 loading 8 zero partition checkpoints for rank 137 loading 8 zero partition checkpoints for rank 214 loading 8 zero partition checkpoints for rank 78 loading 8 zero partition checkpoints for rank 162 loading 8 zero partition checkpoints for rank 4 loading 8 zero partition checkpoints for rank 127 loading 8 zero partition checkpoints for rank 139 loading 8 zero partition checkpoints for rank 110 loading 8 zero partition checkpoints for rank 247 loading 8 zero partition checkpoints for rank 222 loading 8 zero partition checkpoints for rank 229 loading 8 zero partition checkpoints for rank 128 loading 8 zero partition checkpoints for rank 51 loading 8 zero partition checkpoints for rank 174 loading 8 zero partition checkpoints for rank 187 loading 8 zero partition checkpoints for rank 70 loading 8 zero partition checkpoints for rank 215 loading 8 zero partition checkpoints for rank 160 loading 8 zero partition checkpoints for rank 91 loading 8 zero partition checkpoints for rank 49 loading 8 zero partition checkpoints for rank 6 loading 8 zero partition checkpoints for rank 24 loading 8 zero partition checkpoints for rank 243 loading 8 zero partition checkpoints for rank 221 loading 8 zero partition checkpoints for rank 8 loading 8 zero partition checkpoints for rank 20 loading 8 zero partition checkpoints for rank 240 loading 8 zero partition checkpoints for rank 236 loading 8 zero partition checkpoints for rank 2 loading 8 zero partition checkpoints for rank 27 loading 8 zero partition checkpoints for rank 213 loading 8 zero partition checkpoints for rank 176 loading 8 zero partition checkpoints for rank 175 loading 8 zero partition checkpoints for rank 253 loading 8 zero partition checkpoints for rank 209 loading 8 zero partition checkpoints for rank 231 loading 8 zero partition checkpoints for rank 239 loading 8 zero partition checkpoints for rank 88 loading 8 zero partition checkpoints for rank 28 loading 8 zero partition checkpoints for rank 179 loading 8 zero partition checkpoints for rank 185 loading 8 zero partition checkpoints for rank 13 loading 8 zero partition checkpoints for rank 233 loading 8 zero partition checkpoints for rank 11 loading 8 zero partition checkpoints for rank 246 loading 8 zero partition checkpoints for rank 9 loading 8 zero partition checkpoints for rank 224 loading 8 zero partition checkpoints for rank 248 loading 8 zero partition checkpoints for rank 251 loading 8 zero partition checkpoints for rank 1 loading 8 zero partition checkpoints for rank 29 loading 8 zero partition checkpoints for rank 235 loading 8 zero partition checkpoints for rank 250 loading 8 zero partition checkpoints for rank 23 loading 8 zero partition checkpoints for rank 244 loading 8 zero partition checkpoints for rank 241 loading 8 zero partition checkpoints for rank 225 loading 8 zero partition checkpoints for rank 18 loading 8 zero partition checkpoints for rank 234 loading 8 zero partition checkpoints for rank 3 loading 8 zero partition checkpoints for rank 242 loading 8 zero partition checkpoints for rank 0 checkpoint version 3.0 loading 8 zero partition checkpoints for rank 21 loading 8 zero partition checkpoints for rank 249 loading 8 zero partition checkpoints for rank 245 loading 8 zero partition checkpoints for rank 228 loading 8 zero partition checkpoints for rank 26 loading 8 zero partition checkpoints for rank 30 loading 8 zero partition checkpoints for rank 19 loading 8 zero partition checkpoints for rank 15 loading 8 zero partition checkpoints for rank 7 loading 8 zero partition checkpoints for rank 238 loading 8 zero partition checkpoints for rank 17 loading 8 zero partition checkpoints for rank 31 loading 8 zero partition checkpoints for rank 255 loading 8 zero partition checkpoints for rank 12 loading 8 zero partition checkpoints for rank 237 loading 8 zero partition checkpoints for rank 16 loading 8 zero partition checkpoints for rank 254 loading 8 zero partition checkpoints for rank 230 loading 8 zero partition checkpoints for rank 5 loading 8 zero partition checkpoints for rank 25 loading 8 zero partition checkpoints for rank 226 loading 8 zero partition checkpoints for rank 227 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints at iteration 6210 time (ms) | load-checkpoint: 56578.08 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-27 17:45:07 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 300000000 validation: 1638400 test: 10240 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.174718 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > WARNING: could not find index map files, building the indices on rank 0 ... > last epoch number of samples (36925554) is smaller than 80% of number of samples per epoch (131537223), setting separate_last_epoch to True WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-27 17:46:37 CEST)" was missed by 0:00:21.460713 > elasped time to build and save doc-idx mapping (seconds): 74.353737 using: number of documents: 288714672 number of epochs: 3 sequence length: 2048 total number of samples: 394611669 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-27 17:47:37 CEST)" was missed by 0:00:11.662010 > elasped time to build and save sample-idx mapping (seconds): 24.775998 > building shuffle index with split [0, 263074446) and [263074446, 394611669) ... > elasped time to build and save shuffle-idx mapping (seconds): 26.026031 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.089 seconds total number of samples: 394611670 total number of epochs: 3 > WARNING: could not find index map files, building the indices on rank 0 ... > only one epoch required, setting separate_last_epoch to False > elasped time to build and save doc-idx mapping (seconds): 0.979826 using: number of documents: 15211521 number of epochs: 1 sequence length: 2048 total number of samples: 6927160 > elasped time to build and save sample-idx mapping (seconds): 0.364344 > building shuffle index with split [0, 6927160) and [6927160, 6927160) ... > elasped time to build and save shuffle-idx mapping (seconds): 0.312714 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.034 seconds total number of samples: 6927161 total number of epochs: 1 > WARNING: could not find index map files, building the indices on rank 0 ... > only one epoch required, setting separate_last_epoch to False > elasped time to build and save doc-idx mapping (seconds): 0.019056 using: number of documents: 304230 number of epochs: 1 sequence length: 2048 total number of samples: 137383 > elasped time to build and save sample-idx mapping (seconds): 0.007505 > building shuffle index with split [0, 137383) and [137383, 137383) ... > elasped time to build and save shuffle-idx mapping (seconds): 0.021865 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.110 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-09-27 17:47:20 done with setup ... training ... time (ms) | model-and-optimizer-setup: 64587.82 | train/valid/test-data-iterators-setup: 131511.20 [before the start of training step] datetime: 2021-09-27 17:47:20 [2021-09-27 17:47:20,277] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information [2021-09-27 17:47:20,277] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-09-27 17:47:20,277] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 32 total layers [2021-09-27 17:47:20,277] [INFO] [checkpointing.py:415:forward] ----Synchronization False [2021-09-27 17:47:20,277] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False [Rank 225] (after 6220 iterations) memory (MB) | allocated: 7107.7119140625 | max allocated: 11885.68798828125 | reserved: 22492.0 | max reserved: 22492.0 [Rank 226] (after 6220 iterations) memory (MB) | allocated: 7107.7119140625 | max allocated: 11885.6884765625 | reserved: 22492.0 | max reserved: 22492.0 [Rank 1] (after 6220 iterations) memory (MB) | allocated: 6689.83056640625 | max allocated: 13899.01416015625 | reserved: 23278.0 | max reserved: 23278.0 [Rank 2] (after 6220 iterations) memory (MB) | allocated: 6689.83056640625 | max allocated: 13899.01416015625 | reserved: 23278.0 | max reserved: 23278.0 [Rank 0] (after 6220 iterations) memory (MB) | allocated: 6689.83056640625 | max allocated: 13899.01416015625 | reserved: 23246.0 | max reserved: 23246.0 [Rank 224] (after 6220 iterations) memory (MB) | allocated: 7107.7119140625 | max allocated: 11885.68994140625 | reserved: 22492.0 | max reserved: 22492.0 [Rank 227] (after 6220 iterations) memory (MB) | allocated: 7107.7119140625 | max allocated: 11885.6884765625 | reserved: 21700.0 | max reserved: 21700.0 iteration 6220/ 159576 | consumed samples: 194400 | elapsed time per iteration (ms): 19180.4 | learning rate: 5.378E-05 | global batch size: 80 | lm loss: 6.355129E+00 | loss scale: 4096.0 | grad norm: 93535.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [Rank 3] (after 6220 iterations) memory (MB) | allocated: 6689.83056640625 | max allocated: 13899.01416015625 | reserved: 23278.0 | max reserved: 23278.0 [Rank 33] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20130.0 | max reserved: 20130.0 [Rank 66] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 19950.0 | max reserved: 19950.0 [Rank 34] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20250.0 | max reserved: 20250.0 [Rank 98] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19886.0 | max reserved: 19886.0 [Rank 130] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19338.0 | max reserved: 19338.0 [Rank 97] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19402.0 | max reserved: 19402.0 [Rank 161] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 20170.0 | max reserved: 20170.0 [Rank 129] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19050.0 | max reserved: 19050.0 [Rank 193] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 18826.0 | max reserved: 18826.0 [Rank 65] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 19582.0 | max reserved: 19582.0 [Rank 194] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 18970.0 | max reserved: 18970.0 [Rank 162] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 19146.0 | max reserved: 19146.0 [Rank 32] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20676.0 | max reserved: 20676.0 [Rank 96] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 20296.0 | max reserved: 20296.0 [Rank 64] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 20392.0 | max reserved: 20392.0 [Rank 35] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12082.4677734375 | reserved: 20030.0 | max reserved: 20030.0 [Rank 160] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 19636.0 | max reserved: 19636.0 [Rank 192] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 19012.0 | max reserved: 19012.0 [Rank 128] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 20008.0 | max reserved: 20008.0 [Rank 99] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11538.466796875 | reserved: 19870.0 | max reserved: 19870.0 [Rank 67] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11810.46728515625 | reserved: 19582.0 | max reserved: 19582.0 [Rank 131] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11266.46630859375 | reserved: 19278.0 | max reserved: 19278.0 [Rank 195] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10722.46533203125 | reserved: 18970.0 | max reserved: 18970.0 [Rank 163] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10994.4658203125 | reserved: 18826.0 | max reserved: 18826.0 iteration 6230/ 159576 | consumed samples: 195200 | elapsed time per iteration (ms): 17628.9 | learning rate: 5.400E-05 | global batch size: 80 | lm loss: 6.325471E+00 | loss scale: 4096.0 | grad norm: 104626.566 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6240/ 159576 | consumed samples: 196000 | elapsed time per iteration (ms): 17585.3 | learning rate: 5.423E-05 | global batch size: 80 | lm loss: 6.313773E+00 | loss scale: 4096.0 | grad norm: 104488.785 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6250/ 159576 | consumed samples: 196800 | elapsed time per iteration (ms): 17683.9 | learning rate: 5.445E-05 | global batch size: 80 | lm loss: 6.302388E+00 | loss scale: 4096.0 | grad norm: 99404.120 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6260/ 159576 | consumed samples: 197600 | elapsed time per iteration (ms): 17834.3 | learning rate: 5.467E-05 | global batch size: 80 | lm loss: 6.322264E+00 | loss scale: 4096.0 | grad norm: 134601.608 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6270/ 159576 | consumed samples: 198400 | elapsed time per iteration (ms): 17647.5 | learning rate: 5.489E-05 | global batch size: 80 | lm loss: 6.319476E+00 | loss scale: 4096.0 | grad norm: 142879.794 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6280/ 159576 | consumed samples: 199200 | elapsed time per iteration (ms): 17607.4 | learning rate: 5.511E-05 | global batch size: 80 | lm loss: 6.321982E+00 | loss scale: 4096.0 | grad norm: 114136.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6290/ 159576 | consumed samples: 200000 | elapsed time per iteration (ms): 17636.6 | learning rate: 5.534E-05 | global batch size: 80 | lm loss: 6.272703E+00 | loss scale: 4096.0 | grad norm: 101011.949 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6300/ 159576 | consumed samples: 200800 | elapsed time per iteration (ms): 17537.9 | learning rate: 5.556E-05 | global batch size: 80 | lm loss: 6.295881E+00 | loss scale: 4096.0 | grad norm: 116874.031 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6310/ 159576 | consumed samples: 201600 | elapsed time per iteration (ms): 17634.4 | learning rate: 5.578E-05 | global batch size: 80 | lm loss: 6.324175E+00 | loss scale: 4096.0 | grad norm: 115938.037 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6320/ 159576 | consumed samples: 202400 | elapsed time per iteration (ms): 17796.6 | learning rate: 5.600E-05 | global batch size: 80 | lm loss: 6.301260E+00 | loss scale: 4096.0 | grad norm: 128639.863 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6330/ 159576 | consumed samples: 203200 | elapsed time per iteration (ms): 17684.4 | learning rate: 5.622E-05 | global batch size: 80 | lm loss: 6.325212E+00 | loss scale: 4096.0 | grad norm: 122331.136 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6340/ 159576 | consumed samples: 204000 | elapsed time per iteration (ms): 17751.1 | learning rate: 5.645E-05 | global batch size: 80 | lm loss: 6.315152E+00 | loss scale: 4096.0 | grad norm: 107257.166 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 18:28:25] PULSE: tr8-104B is running for 44:59 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6]) iteration 6350/ 159576 | consumed samples: 204800 | elapsed time per iteration (ms): 17472.1 | learning rate: 5.667E-05 | global batch size: 80 | lm loss: 6.305837E+00 | loss scale: 4096.0 | grad norm: 92922.842 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6360/ 159576 | consumed samples: 205600 | elapsed time per iteration (ms): 17585.4 | learning rate: 5.689E-05 | global batch size: 80 | lm loss: 6.291708E+00 | loss scale: 4096.0 | grad norm: 128015.015 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6370/ 159576 | consumed samples: 206400 | elapsed time per iteration (ms): 17756.4 | learning rate: 5.711E-05 | global batch size: 80 | lm loss: 6.336868E+00 | loss scale: 4096.0 | grad norm: 132675.737 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6380/ 159576 | consumed samples: 207200 | elapsed time per iteration (ms): 17470.3 | learning rate: 5.733E-05 | global batch size: 80 | lm loss: 6.319473E+00 | loss scale: 4096.0 | grad norm: 121903.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6390/ 159576 | consumed samples: 208000 | elapsed time per iteration (ms): 17849.6 | learning rate: 5.755E-05 | global batch size: 80 | lm loss: 6.295473E+00 | loss scale: 4096.0 | grad norm: 108842.830 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6400/ 159576 | consumed samples: 208800 | elapsed time per iteration (ms): 17525.6 | learning rate: 5.778E-05 | global batch size: 80 | lm loss: 6.305953E+00 | loss scale: 4096.0 | grad norm: 110142.091 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6410/ 159576 | consumed samples: 209600 | elapsed time per iteration (ms): 17695.6 | learning rate: 5.800E-05 | global batch size: 80 | lm loss: 6.327058E+00 | loss scale: 4096.0 | grad norm: 149204.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6420/ 159576 | consumed samples: 210400 | elapsed time per iteration (ms): 17590.8 | learning rate: 5.822E-05 | global batch size: 80 | lm loss: 6.301820E+00 | loss scale: 4096.0 | grad norm: 90947.052 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6430/ 159576 | consumed samples: 211200 | elapsed time per iteration (ms): 17793.7 | learning rate: 5.844E-05 | global batch size: 80 | lm loss: 6.343626E+00 | loss scale: 4096.0 | grad norm: 345234.052 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6440/ 159576 | consumed samples: 212000 | elapsed time per iteration (ms): 17631.2 | learning rate: 5.866E-05 | global batch size: 80 | lm loss: 6.323440E+00 | loss scale: 4096.0 | grad norm: 96087.714 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6450/ 159576 | consumed samples: 212800 | elapsed time per iteration (ms): 17688.1 | learning rate: 5.889E-05 | global batch size: 80 | lm loss: 6.310754E+00 | loss scale: 4096.0 | grad norm: 142702.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6460/ 159576 | consumed samples: 213600 | elapsed time per iteration (ms): 17884.9 | learning rate: 5.911E-05 | global batch size: 80 | lm loss: 6.326996E+00 | loss scale: 4096.0 | grad norm: 139353.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6470/ 159576 | consumed samples: 214400 | elapsed time per iteration (ms): 17777.5 | learning rate: 5.933E-05 | global batch size: 80 | lm loss: 6.303541E+00 | loss scale: 4096.0 | grad norm: 163735.847 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6480/ 159576 | consumed samples: 215200 | elapsed time per iteration (ms): 17758.4 | learning rate: 5.955E-05 | global batch size: 80 | lm loss: 6.318764E+00 | loss scale: 4096.0 | grad norm: 122570.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6490/ 159576 | consumed samples: 216000 | elapsed time per iteration (ms): 17864.1 | learning rate: 5.977E-05 | global batch size: 80 | lm loss: 6.307048E+00 | loss scale: 4096.0 | grad norm: 116946.724 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6500/ 159576 | consumed samples: 216800 | elapsed time per iteration (ms): 17901.7 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.315722E+00 | loss scale: 4096.0 | grad norm: 93922.032 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6510/ 159576 | consumed samples: 217600 | elapsed time per iteration (ms): 17582.8 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.323491E+00 | loss scale: 4096.0 | grad norm: 148357.794 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6520/ 159576 | consumed samples: 218400 | elapsed time per iteration (ms): 17725.2 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.330975E+00 | loss scale: 4096.0 | grad norm: 103909.494 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6530/ 159576 | consumed samples: 219200 | elapsed time per iteration (ms): 17788.4 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.330465E+00 | loss scale: 4096.0 | grad norm: 112690.620 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6540/ 159576 | consumed samples: 220000 | elapsed time per iteration (ms): 17722.2 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.325342E+00 | loss scale: 4096.0 | grad norm: 74738.856 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6550/ 159576 | consumed samples: 220800 | elapsed time per iteration (ms): 17778.1 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.338161E+00 | loss scale: 4096.0 | grad norm: 92386.024 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 19:28:18] PULSE: tr8-104B is running for 1:44:52 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6]) iteration 6560/ 159576 | consumed samples: 221600 | elapsed time per iteration (ms): 17633.8 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.346842E+00 | loss scale: 4096.0 | grad norm: 91412.181 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6570/ 159576 | consumed samples: 222400 | elapsed time per iteration (ms): 17585.9 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.354182E+00 | loss scale: 4096.0 | grad norm: 106016.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6580/ 159576 | consumed samples: 223200 | elapsed time per iteration (ms): 17723.8 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.339022E+00 | loss scale: 4096.0 | grad norm: 99292.123 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6590/ 159576 | consumed samples: 224000 | elapsed time per iteration (ms): 17636.7 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.343359E+00 | loss scale: 4096.0 | grad norm: 142334.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6600/ 159576 | consumed samples: 224800 | elapsed time per iteration (ms): 17663.9 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.340461E+00 | loss scale: 4096.0 | grad norm: 152141.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6610/ 159576 | consumed samples: 225600 | elapsed time per iteration (ms): 17548.3 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.323914E+00 | loss scale: 4096.0 | grad norm: 170495.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6620/ 159576 | consumed samples: 226400 | elapsed time per iteration (ms): 17566.2 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.304215E+00 | loss scale: 4096.0 | grad norm: 160242.764 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6630/ 159576 | consumed samples: 227200 | elapsed time per iteration (ms): 17951.1 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.312865E+00 | loss scale: 4096.0 | grad norm: 104923.640 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6640/ 159576 | consumed samples: 228000 | elapsed time per iteration (ms): 17693.7 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.337115E+00 | loss scale: 4096.0 | grad norm: 162544.865 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6650/ 159576 | consumed samples: 228800 | elapsed time per iteration (ms): 17707.3 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.327879E+00 | loss scale: 4096.0 | grad norm: 80497.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6660/ 159576 | consumed samples: 229600 | elapsed time per iteration (ms): 17584.5 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.404206E+00 | loss scale: 4096.0 | grad norm: 136886.090 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6670/ 159576 | consumed samples: 230400 | elapsed time per iteration (ms): 17615.2 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.359778E+00 | loss scale: 4096.0 | grad norm: 123501.796 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6680/ 159576 | consumed samples: 231200 | elapsed time per iteration (ms): 17812.0 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.318851E+00 | loss scale: 4096.0 | grad norm: 118146.851 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6690/ 159576 | consumed samples: 232000 | elapsed time per iteration (ms): 17690.8 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.324978E+00 | loss scale: 4096.0 | grad norm: 127513.155 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6700/ 159576 | consumed samples: 232800 | elapsed time per iteration (ms): 17679.3 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.312429E+00 | loss scale: 4096.0 | grad norm: 141251.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6710/ 159576 | consumed samples: 233600 | elapsed time per iteration (ms): 17730.1 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.304575E+00 | loss scale: 8192.0 | grad norm: 354806.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6720/ 159576 | consumed samples: 234400 | elapsed time per iteration (ms): 17817.5 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.343853E+00 | loss scale: 8192.0 | grad norm: 400003.537 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6730/ 159576 | consumed samples: 235200 | elapsed time per iteration (ms): 17886.0 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.329220E+00 | loss scale: 8192.0 | grad norm: 354798.775 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6740/ 159576 | consumed samples: 236000 | elapsed time per iteration (ms): 17869.3 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.341031E+00 | loss scale: 8192.0 | grad norm: 452433.886 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6750/ 159576 | consumed samples: 236912 | elapsed time per iteration (ms): 18328.8 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.325079E+00 | loss scale: 8192.0 | grad norm: 272354.067 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6760/ 159576 | consumed samples: 237872 | elapsed time per iteration (ms): 17158.6 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.350076E+00 | loss scale: 4096.0 | grad norm: 109464.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 20:32:07] PULSE: tr8-104B is running for 2:48:41 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6]) iteration 6770/ 159576 | consumed samples: 238832 | elapsed time per iteration (ms): 18779.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.347258E+00 | loss scale: 4096.0 | grad norm: 151362.578 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6780/ 159576 | consumed samples: 239792 | elapsed time per iteration (ms): 18764.2 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.483617E+00 | loss scale: 4096.0 | grad norm: 144409.728 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6790/ 159576 | consumed samples: 240752 | elapsed time per iteration (ms): 18830.0 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.459402E+00 | loss scale: 4096.0 | grad norm: 106762.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6800/ 159576 | consumed samples: 241712 | elapsed time per iteration (ms): 18594.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.457979E+00 | loss scale: 4096.0 | grad norm: 159826.924 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6810/ 159576 | consumed samples: 242672 | elapsed time per iteration (ms): 18590.0 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.445743E+00 | loss scale: 4096.0 | grad norm: 104586.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6820/ 159576 | consumed samples: 243632 | elapsed time per iteration (ms): 18726.4 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.371418E+00 | loss scale: 4096.0 | grad norm: 181059.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6830/ 159576 | consumed samples: 244592 | elapsed time per iteration (ms): 18734.3 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.385859E+00 | loss scale: 4096.0 | grad norm: 126958.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6840/ 159576 | consumed samples: 245552 | elapsed time per iteration (ms): 18634.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.351850E+00 | loss scale: 4096.0 | grad norm: 154126.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6850/ 159576 | consumed samples: 246512 | elapsed time per iteration (ms): 18587.1 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.341198E+00 | loss scale: 4096.0 | grad norm: 133262.949 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6860/ 159576 | consumed samples: 247472 | elapsed time per iteration (ms): 19013.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.317137E+00 | loss scale: 4096.0 | grad norm: 101860.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6870/ 159576 | consumed samples: 248432 | elapsed time per iteration (ms): 18789.2 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.332655E+00 | loss scale: 4096.0 | grad norm: 467416.787 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6880/ 159576 | consumed samples: 249392 | elapsed time per iteration (ms): 18654.5 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.385090E+00 | loss scale: 4096.0 | grad norm: 154062.615 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6890/ 159576 | consumed samples: 250352 | elapsed time per iteration (ms): 18644.4 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.355402E+00 | loss scale: 4096.0 | grad norm: 154349.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6900/ 159576 | consumed samples: 251312 | elapsed time per iteration (ms): 18495.6 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.365808E+00 | loss scale: 4096.0 | grad norm: 95313.572 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6910/ 159576 | consumed samples: 252272 | elapsed time per iteration (ms): 18802.1 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.598378E+00 | loss scale: 4096.0 | grad norm: 84678.880 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6920/ 159576 | consumed samples: 253232 | elapsed time per iteration (ms): 18641.0 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.314456E+00 | loss scale: 4096.0 | grad norm: 122716.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6930/ 159576 | consumed samples: 254192 | elapsed time per iteration (ms): 18564.1 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 9.121927E+00 | loss scale: 4096.0 | grad norm: 283384.130 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6940/ 159576 | consumed samples: 255152 | elapsed time per iteration (ms): 18549.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 1.023865E+01 | loss scale: 4096.0 | grad norm: 42359.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6950/ 159576 | consumed samples: 256112 | elapsed time per iteration (ms): 17675.8 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 9.249577E+00 | loss scale: 2048.0 | grad norm: 78368.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6960/ 159576 | consumed samples: 257072 | elapsed time per iteration (ms): 18443.5 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 8.389180E+00 | loss scale: 2048.0 | grad norm: 40490.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6970/ 159576 | consumed samples: 258032 | elapsed time per iteration (ms): 18545.1 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.529938E+00 | loss scale: 2048.0 | grad norm: 14218.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 21:35:01] PULSE: tr8-104B is running for 3:51:35 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6]) iteration 6980/ 159576 | consumed samples: 258992 | elapsed time per iteration (ms): 18379.4 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.102215E+00 | loss scale: 2048.0 | grad norm: 18580.148 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6990/ 159576 | consumed samples: 259952 | elapsed time per iteration (ms): 18355.5 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 7.018941E+00 | loss scale: 2048.0 | grad norm: 17882.180 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7000/ 159576 | consumed samples: 260912 | elapsed time per iteration (ms): 18505.9 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.942125E+00 | loss scale: 2048.0 | grad norm: 26860.562 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) ------------------------------------------------------------------------------------------------ validation loss at iteration 7000 | lm loss value: 6.872679E+00 | lm loss PPL: 9.655315E+02 | ------------------------------------------------------------------------------------------------ iteration 7010/ 159576 | consumed samples: 261872 | elapsed time per iteration (ms): 19970.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.816376E+00 | loss scale: 2048.0 | grad norm: 40294.075 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7020/ 159576 | consumed samples: 262832 | elapsed time per iteration (ms): 18648.1 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.821559E+00 | loss scale: 2048.0 | grad norm: 25012.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7030/ 159576 | consumed samples: 263792 | elapsed time per iteration (ms): 18478.0 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.893867E+00 | loss scale: 2048.0 | grad norm: 39565.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7040/ 159576 | consumed samples: 264752 | elapsed time per iteration (ms): 18670.1 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.871474E+00 | loss scale: 2048.0 | grad norm: 22832.888 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7050/ 159576 | consumed samples: 265712 | elapsed time per iteration (ms): 18521.8 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.875928E+00 | loss scale: 2048.0 | grad norm: 26237.022 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7060/ 159576 | consumed samples: 266672 | elapsed time per iteration (ms): 18543.5 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.827568E+00 | loss scale: 2048.0 | grad norm: 31639.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7070/ 159576 | consumed samples: 267632 | elapsed time per iteration (ms): 18564.4 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.711889E+00 | loss scale: 2048.0 | grad norm: 46310.481 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7080/ 159576 | consumed samples: 268592 | elapsed time per iteration (ms): 18629.8 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.683693E+00 | loss scale: 2048.0 | grad norm: 31484.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7090/ 159576 | consumed samples: 269552 | elapsed time per iteration (ms): 18473.8 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.627121E+00 | loss scale: 2048.0 | grad norm: 45017.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7100/ 159576 | consumed samples: 270512 | elapsed time per iteration (ms): 18806.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.627071E+00 | loss scale: 2048.0 | grad norm: 57880.707 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7110/ 159576 | consumed samples: 271472 | elapsed time per iteration (ms): 18537.3 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.608931E+00 | loss scale: 2048.0 | grad norm: 67724.648 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7120/ 159576 | consumed samples: 272432 | elapsed time per iteration (ms): 18556.3 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.592625E+00 | loss scale: 2048.0 | grad norm: 67655.063 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7130/ 159576 | consumed samples: 273392 | elapsed time per iteration (ms): 18620.1 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.769730E+00 | loss scale: 2048.0 | grad norm: 50594.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7140/ 159576 | consumed samples: 274352 | elapsed time per iteration (ms): 18517.9 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.749163E+00 | loss scale: 2048.0 | grad norm: 30940.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7150/ 159576 | consumed samples: 275312 | elapsed time per iteration (ms): 18726.4 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.695554E+00 | loss scale: 2048.0 | grad norm: 49756.042 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 22:31:42] PULSE: tr8-104B is running for 4:48:16 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6]) iteration 7160/ 159576 | consumed samples: 276272 | elapsed time per iteration (ms): 18567.4 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.630823E+00 | loss scale: 2048.0 | grad norm: 46573.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7170/ 159576 | consumed samples: 277232 | elapsed time per iteration (ms): 18787.6 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.637067E+00 | loss scale: 2048.0 | grad norm: 47650.692 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7180/ 159576 | consumed samples: 278192 | elapsed time per iteration (ms): 18669.9 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.663966E+00 | loss scale: 2048.0 | grad norm: 54677.698 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7190/ 159576 | consumed samples: 279152 | elapsed time per iteration (ms): 18711.4 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.603532E+00 | loss scale: 2048.0 | grad norm: 75914.515 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7200/ 159576 | consumed samples: 280112 | elapsed time per iteration (ms): 18682.4 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.571133E+00 | loss scale: 2048.0 | grad norm: 74379.166 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7210/ 159576 | consumed samples: 281072 | elapsed time per iteration (ms): 18622.6 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.584048E+00 | loss scale: 2048.0 | grad norm: 75888.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7220/ 159576 | consumed samples: 282032 | elapsed time per iteration (ms): 18555.6 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.554535E+00 | loss scale: 2048.0 | grad norm: 90934.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7230/ 159576 | consumed samples: 282992 | elapsed time per iteration (ms): 18600.5 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.558411E+00 | loss scale: 2048.0 | grad norm: 54832.822 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7240/ 159576 | consumed samples: 284032 | elapsed time per iteration (ms): 19119.6 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.585645E+00 | loss scale: 2048.0 | grad norm: 116769.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7250/ 159576 | consumed samples: 285152 | elapsed time per iteration (ms): 19421.9 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.554094E+00 | loss scale: 2048.0 | grad norm: 79780.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7260/ 159576 | consumed samples: 286272 | elapsed time per iteration (ms): 19643.2 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.545351E+00 | loss scale: 2048.0 | grad norm: 153165.121 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7270/ 159576 | consumed samples: 287392 | elapsed time per iteration (ms): 19873.2 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.548807E+00 | loss scale: 2048.0 | grad norm: 96725.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7280/ 159576 | consumed samples: 288512 | elapsed time per iteration (ms): 19830.3 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.532312E+00 | loss scale: 2048.0 | grad norm: 85054.846 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7290/ 159576 | consumed samples: 289632 | elapsed time per iteration (ms): 19469.1 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.535855E+00 | loss scale: 2048.0 | grad norm: 66255.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7300/ 159576 | consumed samples: 290752 | elapsed time per iteration (ms): 19578.9 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.583752E+00 | loss scale: 2048.0 | grad norm: 61901.507 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7310/ 159576 | consumed samples: 291872 | elapsed time per iteration (ms): 19646.2 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.539584E+00 | loss scale: 2048.0 | grad norm: 68238.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7320/ 159576 | consumed samples: 292992 | elapsed time per iteration (ms): 19642.5 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.526649E+00 | loss scale: 2048.0 | grad norm: 69527.941 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7330/ 159576 | consumed samples: 294112 | elapsed time per iteration (ms): 19508.3 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.514026E+00 | loss scale: 2048.0 | grad norm: 63745.755 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7340/ 159576 | consumed samples: 295232 | elapsed time per iteration (ms): 19676.4 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.519949E+00 | loss scale: 2048.0 | grad norm: 96730.566 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-27 23:32:04] PULSE: tr8-104B is running for 5:48:38 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6]) iteration 7350/ 159576 | consumed samples: 296352 | elapsed time per iteration (ms): 19539.0 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.510521E+00 | loss scale: 2048.0 | grad norm: 95201.544 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7360/ 159576 | consumed samples: 297472 | elapsed time per iteration (ms): 19834.3 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.532115E+00 | loss scale: 2048.0 | grad norm: 269153.773 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7370/ 159576 | consumed samples: 298592 | elapsed time per iteration (ms): 19564.2 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.501956E+00 | loss scale: 2048.0 | grad norm: 89998.728 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7380/ 159576 | consumed samples: 299712 | elapsed time per iteration (ms): 19672.7 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.522272E+00 | loss scale: 2048.0 | grad norm: 75724.702 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7390/ 159576 | consumed samples: 300832 | elapsed time per iteration (ms): 19562.0 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.511443E+00 | loss scale: 2048.0 | grad norm: 89537.752 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7400/ 159576 | consumed samples: 301952 | elapsed time per iteration (ms): 19728.4 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.534271E+00 | loss scale: 2048.0 | grad norm: 79036.616 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7410/ 159576 | consumed samples: 303072 | elapsed time per iteration (ms): 19731.8 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.550716E+00 | loss scale: 2048.0 | grad norm: 60002.009 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7420/ 159576 | consumed samples: 304192 | elapsed time per iteration (ms): 19733.0 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.546501E+00 | loss scale: 2048.0 | grad norm: 69147.056 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7430/ 159576 | consumed samples: 305312 | elapsed time per iteration (ms): 19483.2 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.560014E+00 | loss scale: 2048.0 | grad norm: 75450.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7440/ 159576 | consumed samples: 306432 | elapsed time per iteration (ms): 19613.5 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.523249E+00 | loss scale: 2048.0 | grad norm: 104393.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7450/ 159576 | consumed samples: 307552 | elapsed time per iteration (ms): 19763.9 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.510474E+00 | loss scale: 4096.0 | grad norm: 189305.762 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7460/ 159576 | consumed samples: 308672 | elapsed time per iteration (ms): 19871.2 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.501906E+00 | loss scale: 4096.0 | grad norm: 277069.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7470/ 159576 | consumed samples: 309792 | elapsed time per iteration (ms): 18903.0 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.497433E+00 | loss scale: 4096.0 | grad norm: 225644.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7480/ 159576 | consumed samples: 310912 | elapsed time per iteration (ms): 19707.4 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.488033E+00 | loss scale: 4096.0 | grad norm: 230163.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7490/ 159576 | consumed samples: 312032 | elapsed time per iteration (ms): 19720.9 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.505843E+00 | loss scale: 4096.0 | grad norm: 238654.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7500/ 159576 | consumed samples: 313152 | elapsed time per iteration (ms): 18950.8 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.477815E+00 | loss scale: 2048.0 | grad norm: 106401.719 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 7500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-28 00:24:01,519] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step7500/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 7500 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 17115.61 iteration 7510/ 159576 | consumed samples: 314272 | elapsed time per iteration (ms): 21118.3 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.494813E+00 | loss scale: 2048.0 | grad norm: 111065.941 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7520/ 159576 | consumed samples: 315392 | elapsed time per iteration (ms): 19805.8 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.508061E+00 | loss scale: 2048.0 | grad norm: 108163.665 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-28 00:32:54] PULSE: tr8-104B is running for 6:49:28 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6]) iteration 7530/ 159576 | consumed samples: 316512 | elapsed time per iteration (ms): 19675.1 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.531902E+00 | loss scale: 2048.0 | grad norm: 113133.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7540/ 159576 | consumed samples: 317632 | elapsed time per iteration (ms): 19542.7 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.512622E+00 | loss scale: 2048.0 | grad norm: 124840.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7550/ 159576 | consumed samples: 318752 | elapsed time per iteration (ms): 19516.2 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.501436E+00 | loss scale: 2048.0 | grad norm: 133229.950 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7560/ 159576 | consumed samples: 319872 | elapsed time per iteration (ms): 19503.1 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.490542E+00 | loss scale: 2048.0 | grad norm: 71964.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7570/ 159576 | consumed samples: 320992 | elapsed time per iteration (ms): 19421.6 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.521871E+00 | loss scale: 2048.0 | grad norm: 88801.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7580/ 159576 | consumed samples: 322112 | elapsed time per iteration (ms): 19481.2 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.505743E+00 | loss scale: 2048.0 | grad norm: 284454.050 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7590/ 159576 | consumed samples: 323232 | elapsed time per iteration (ms): 19560.8 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.490807E+00 | loss scale: 2048.0 | grad norm: 110863.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7600/ 159576 | consumed samples: 324352 | elapsed time per iteration (ms): 19566.7 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.490352E+00 | loss scale: 2048.0 | grad norm: 99394.185 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7610/ 159576 | consumed samples: 325472 | elapsed time per iteration (ms): 19546.1 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.487664E+00 | loss scale: 2048.0 | grad norm: 98963.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7620/ 159576 | consumed samples: 326592 | elapsed time per iteration (ms): 19448.4 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.495935E+00 | loss scale: 2048.0 | grad norm: 80186.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7630/ 159576 | consumed samples: 327712 | elapsed time per iteration (ms): 19586.5 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.485136E+00 | loss scale: 2048.0 | grad norm: 90794.926 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7640/ 159576 | consumed samples: 328832 | elapsed time per iteration (ms): 19579.4 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.484132E+00 | loss scale: 2048.0 | grad norm: 120050.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7650/ 159576 | consumed samples: 329952 | elapsed time per iteration (ms): 19625.6 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.474982E+00 | loss scale: 2048.0 | grad norm: 132690.701 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7660/ 159576 | consumed samples: 331120 | elapsed time per iteration (ms): 19869.8 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.502007E+00 | loss scale: 2048.0 | grad norm: 141077.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7670/ 159576 | consumed samples: 332400 | elapsed time per iteration (ms): 20699.4 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.459695E+00 | loss scale: 2048.0 | grad norm: 170892.684 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7680/ 159576 | consumed samples: 333680 | elapsed time per iteration (ms): 20602.2 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.471451E+00 | loss scale: 2048.0 | grad norm: 186408.144 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7690/ 159576 | consumed samples: 334960 | elapsed time per iteration (ms): 20925.9 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.450164E+00 | loss scale: 2048.0 | grad norm: 126551.055 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7700/ 159576 | consumed samples: 336240 | elapsed time per iteration (ms): 20872.8 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.483758E+00 | loss scale: 2048.0 | grad norm: 113828.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-28 01:32:21] PULSE: tr8-104B is running for 7:48:55 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6]) iteration 7710/ 159576 | consumed samples: 337520 | elapsed time per iteration (ms): 20786.9 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.474139E+00 | loss scale: 2048.0 | grad norm: 92984.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7720/ 159576 | consumed samples: 338800 | elapsed time per iteration (ms): 20911.7 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.465121E+00 | loss scale: 2048.0 | grad norm: 101949.520 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7730/ 159576 | consumed samples: 340080 | elapsed time per iteration (ms): 20160.2 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.493755E+00 | loss scale: 1024.0 | grad norm: 47045.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7740/ 159576 | consumed samples: 341360 | elapsed time per iteration (ms): 20757.9 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.475374E+00 | loss scale: 1024.0 | grad norm: 62044.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7750/ 159576 | consumed samples: 342640 | elapsed time per iteration (ms): 20801.0 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.480064E+00 | loss scale: 1024.0 | grad norm: 55223.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7760/ 159576 | consumed samples: 343920 | elapsed time per iteration (ms): 20712.1 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.477321E+00 | loss scale: 1024.0 | grad norm: 75612.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7770/ 159576 | consumed samples: 345200 | elapsed time per iteration (ms): 20773.8 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.486430E+00 | loss scale: 1024.0 | grad norm: 57309.889 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7780/ 159576 | consumed samples: 346480 | elapsed time per iteration (ms): 20686.3 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.465924E+00 | loss scale: 1024.0 | grad norm: 78208.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7790/ 159576 | consumed samples: 347760 | elapsed time per iteration (ms): 20744.2 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.439983E+00 | loss scale: 1024.0 | grad norm: 85978.082 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7800/ 159576 | consumed samples: 349040 | elapsed time per iteration (ms): 20858.0 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.466323E+00 | loss scale: 1024.0 | grad norm: 83254.794 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7810/ 159576 | consumed samples: 350320 | elapsed time per iteration (ms): 20728.1 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.452026E+00 | loss scale: 1024.0 | grad norm: 82300.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7820/ 159576 | consumed samples: 351600 | elapsed time per iteration (ms): 20746.4 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.471143E+00 | loss scale: 1024.0 | grad norm: 70196.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7830/ 159576 | consumed samples: 352880 | elapsed time per iteration (ms): 20801.6 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.484294E+00 | loss scale: 1024.0 | grad norm: 52460.842 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7840/ 159576 | consumed samples: 354160 | elapsed time per iteration (ms): 20885.5 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.492403E+00 | loss scale: 1024.0 | grad norm: 61833.655 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7850/ 159576 | consumed samples: 355440 | elapsed time per iteration (ms): 20657.1 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.466279E+00 | loss scale: 1024.0 | grad norm: 62285.100 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7860/ 159576 | consumed samples: 356720 | elapsed time per iteration (ms): 19964.7 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.448762E+00 | loss scale: 512.0 | grad norm: 76192.061 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7870/ 159576 | consumed samples: 358000 | elapsed time per iteration (ms): 20780.6 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.468709E+00 | loss scale: 512.0 | grad norm: 27166.098 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7880/ 159576 | consumed samples: 359280 | elapsed time per iteration (ms): 20507.3 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.619281E+00 | loss scale: 512.0 | grad norm: 27451.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-28 02:32:25] PULSE: tr8-104B is scheduled to start in 17:52:43 (at 2021-09-28T20:25:09) (1277218 on 'gpu_p13' partition) [2021-09-28 02:32:25] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1277216 on 'gpu_p13' partition) [2021-09-28 02:32:25] PULSE: tr8-104B is running for 8:48:59 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6]) iteration 7890/ 159576 | consumed samples: 360560 | elapsed time per iteration (ms): 20685.2 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.639037E+00 | loss scale: 512.0 | grad norm: 21160.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7900/ 159576 | consumed samples: 361840 | elapsed time per iteration (ms): 20486.0 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.220924E+00 | loss scale: 512.0 | grad norm: 53815.762 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7910/ 159576 | consumed samples: 363120 | elapsed time per iteration (ms): 20468.9 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.521174E+00 | loss scale: 512.0 | grad norm: 36754.779 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7920/ 159576 | consumed samples: 364400 | elapsed time per iteration (ms): 20813.9 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.992101E+00 | loss scale: 512.0 | grad norm: 30259.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7930/ 159576 | consumed samples: 365680 | elapsed time per iteration (ms): 20655.4 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.722378E+00 | loss scale: 512.0 | grad norm: 10480.959 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7940/ 159576 | consumed samples: 366960 | elapsed time per iteration (ms): 20950.6 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.599706E+00 | loss scale: 512.0 | grad norm: 6665.941 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7950/ 159576 | consumed samples: 368240 | elapsed time per iteration (ms): 20603.6 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.160472E+00 | loss scale: 512.0 | grad norm: 8111.009 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7960/ 159576 | consumed samples: 369520 | elapsed time per iteration (ms): 20624.6 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.089828E+00 | loss scale: 512.0 | grad norm: 6778.536 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7970/ 159576 | consumed samples: 370800 | elapsed time per iteration (ms): 20691.0 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.228701E+00 | loss scale: 512.0 | grad norm: 20747.579 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7980/ 159576 | consumed samples: 372080 | elapsed time per iteration (ms): 20583.0 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.185581E+00 | loss scale: 512.0 | grad norm: 12240.877 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 7990/ 159576 | consumed samples: 373360 | elapsed time per iteration (ms): 20386.0 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.279394E+00 | loss scale: 512.0 | grad norm: 11910.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-28 03:13:26,386] [INFO] [logging.py:68:log_dist] [Rank 0] step=8000, skipped=20, lr=[5.99997953854461e-05, 5.99997953854461e-05], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 8000/ 159576 | consumed samples: 374640 | elapsed time per iteration (ms): 20647.9 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 7.117274E+00 | loss scale: 512.0 | grad norm: 6363.554 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) steps: 8000 loss: 6.9948 iter time (s): 0.010 samples/sec: 12373.041 ------------------------------------------------------------------------------------------------ validation loss at iteration 8000 | lm loss value: 6.930026E+00 | lm loss PPL: 1.022520E+03 | ------------------------------------------------------------------------------------------------ iteration 8010/ 159576 | consumed samples: 375920 | elapsed time per iteration (ms): 22250.2 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.907596E+00 | loss scale: 512.0 | grad norm: 5175.818 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8020/ 159576 | consumed samples: 377200 | elapsed time per iteration (ms): 20702.8 | learning rate: 6.000E-05 | global batch size: 128 | lm loss: 6.903972E+00 | loss scale: 512.0 | grad norm: 8915.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8030/ 159576 | consumed samples: 378544 | elapsed time per iteration (ms): 21181.5 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.942516E+00 | loss scale: 512.0 | grad norm: 8113.065 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8040/ 159576 | consumed samples: 379984 | elapsed time per iteration (ms): 21914.5 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.923864E+00 | loss scale: 512.0 | grad norm: 19249.730 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8050/ 159576 | consumed samples: 381424 | elapsed time per iteration (ms): 21865.5 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.876669E+00 | loss scale: 512.0 | grad norm: 7890.746 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-28 03:32:27] PULSE: tr8-104B is scheduled to start in 19:12:32 (at 2021-09-28T22:45:00) (1277218 on 'gpu_p13' partition) [2021-09-28 03:32:27] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1277295_[1-10%1] on 'gpu_p13' partition) [2021-09-28 03:32:27] PULSE: tr8-104B is running for 9:49:01 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6]) iteration 8060/ 159576 | consumed samples: 382864 | elapsed time per iteration (ms): 21779.1 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.788055E+00 | loss scale: 512.0 | grad norm: 9618.538 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8070/ 159576 | consumed samples: 384304 | elapsed time per iteration (ms): 21643.3 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.808229E+00 | loss scale: 512.0 | grad norm: 8857.044 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8080/ 159576 | consumed samples: 385744 | elapsed time per iteration (ms): 21639.1 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.901846E+00 | loss scale: 512.0 | grad norm: 8983.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8090/ 159576 | consumed samples: 387184 | elapsed time per iteration (ms): 22052.4 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.863363E+00 | loss scale: 512.0 | grad norm: 9399.920 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8100/ 159576 | consumed samples: 388624 | elapsed time per iteration (ms): 21866.1 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.843295E+00 | loss scale: 512.0 | grad norm: 8690.802 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8110/ 159576 | consumed samples: 390064 | elapsed time per iteration (ms): 21853.1 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.893594E+00 | loss scale: 512.0 | grad norm: 13780.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8120/ 159576 | consumed samples: 391504 | elapsed time per iteration (ms): 21812.6 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.924708E+00 | loss scale: 512.0 | grad norm: 7097.791 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8130/ 159576 | consumed samples: 392944 | elapsed time per iteration (ms): 21586.9 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.829758E+00 | loss scale: 512.0 | grad norm: 7266.647 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8140/ 159576 | consumed samples: 394384 | elapsed time per iteration (ms): 21935.4 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.820535E+00 | loss scale: 512.0 | grad norm: 7758.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8150/ 159576 | consumed samples: 395824 | elapsed time per iteration (ms): 21921.3 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.822125E+00 | loss scale: 512.0 | grad norm: 6965.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8160/ 159576 | consumed samples: 397264 | elapsed time per iteration (ms): 21703.6 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.756792E+00 | loss scale: 512.0 | grad norm: 9871.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8170/ 159576 | consumed samples: 398704 | elapsed time per iteration (ms): 21847.9 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.773450E+00 | loss scale: 512.0 | grad norm: 12746.115 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8180/ 159576 | consumed samples: 400144 | elapsed time per iteration (ms): 21833.8 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.785934E+00 | loss scale: 512.0 | grad norm: 5598.866 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8190/ 159576 | consumed samples: 401584 | elapsed time per iteration (ms): 21797.4 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.870234E+00 | loss scale: 512.0 | grad norm: 6782.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8200/ 159576 | consumed samples: 403024 | elapsed time per iteration (ms): 21810.1 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.838039E+00 | loss scale: 512.0 | grad norm: 9577.527 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8210/ 159576 | consumed samples: 404464 | elapsed time per iteration (ms): 21905.3 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.807652E+00 | loss scale: 512.0 | grad norm: 11918.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-28 04:33:02] PULSE: tr8-104B is scheduled to start in 18:11:57 (at 2021-09-28T22:45:00) (1277218 on 'gpu_p13' partition) [2021-09-28 04:33:02] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1277295_[1-10%1] on 'gpu_p13' partition) [2021-09-28 04:33:02] PULSE: tr8-104B is running for 10:49:36 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6]) iteration 8220/ 159576 | consumed samples: 405904 | elapsed time per iteration (ms): 21977.1 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.819595E+00 | loss scale: 512.0 | grad norm: 6882.121 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8230/ 159576 | consumed samples: 407344 | elapsed time per iteration (ms): 21630.3 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.880849E+00 | loss scale: 512.0 | grad norm: 17414.946 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8240/ 159576 | consumed samples: 408784 | elapsed time per iteration (ms): 21894.4 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.930541E+00 | loss scale: 512.0 | grad norm: 7836.035 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8250/ 159576 | consumed samples: 410224 | elapsed time per iteration (ms): 21731.4 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.906449E+00 | loss scale: 512.0 | grad norm: 7978.667 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8260/ 159576 | consumed samples: 411664 | elapsed time per iteration (ms): 21776.5 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.893109E+00 | loss scale: 512.0 | grad norm: 9114.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8270/ 159576 | consumed samples: 413104 | elapsed time per iteration (ms): 22166.2 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.885992E+00 | loss scale: 512.0 | grad norm: 13085.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8280/ 159576 | consumed samples: 414544 | elapsed time per iteration (ms): 21762.3 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.789729E+00 | loss scale: 512.0 | grad norm: 11443.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8290/ 159576 | consumed samples: 415984 | elapsed time per iteration (ms): 21743.1 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.784861E+00 | loss scale: 512.0 | grad norm: 10437.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8300/ 159576 | consumed samples: 417424 | elapsed time per iteration (ms): 21878.0 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.831153E+00 | loss scale: 512.0 | grad norm: 6842.857 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8310/ 159576 | consumed samples: 418864 | elapsed time per iteration (ms): 21680.7 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.847891E+00 | loss scale: 512.0 | grad norm: 8236.158 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8320/ 159576 | consumed samples: 420304 | elapsed time per iteration (ms): 21650.4 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.831273E+00 | loss scale: 512.0 | grad norm: 10757.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8330/ 159576 | consumed samples: 421744 | elapsed time per iteration (ms): 21761.1 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.866577E+00 | loss scale: 512.0 | grad norm: 9414.173 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8340/ 159576 | consumed samples: 423184 | elapsed time per iteration (ms): 22000.3 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 6.927114E+00 | loss scale: 512.0 | grad norm: 22264.468 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8350/ 159576 | consumed samples: 424624 | elapsed time per iteration (ms): 21732.0 | learning rate: 6.000E-05 | global batch size: 144 | lm loss: 7.098891E+00 | loss scale: 512.0 | grad norm: 10280.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8360/ 159576 | consumed samples: 426160 | elapsed time per iteration (ms): 22517.6 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.958164E+00 | loss scale: 1024.0 | grad norm: 13178.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8370/ 159576 | consumed samples: 427760 | elapsed time per iteration (ms): 23182.1 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.889060E+00 | loss scale: 1024.0 | grad norm: 18842.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8380/ 159576 | consumed samples: 429360 | elapsed time per iteration (ms): 23097.1 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.878168E+00 | loss scale: 1024.0 | grad norm: 18421.706 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-28 05:32:36] PULSE: tr8-104B is scheduled to start in 17:12:23 (at 2021-09-28T22:45:00) (1277218 on 'gpu_p13' partition) [2021-09-28 05:32:36] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1277295_[1-10%1] on 'gpu_p13' partition) [2021-09-28 05:32:36] PULSE: tr8-104B is running for 11:49:10 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6]) iteration 8390/ 159576 | consumed samples: 430960 | elapsed time per iteration (ms): 22911.1 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.836983E+00 | loss scale: 1024.0 | grad norm: 21055.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8400/ 159576 | consumed samples: 432560 | elapsed time per iteration (ms): 23311.7 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.867126E+00 | loss scale: 1024.0 | grad norm: 13309.684 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8410/ 159576 | consumed samples: 434160 | elapsed time per iteration (ms): 22945.0 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.896465E+00 | loss scale: 1024.0 | grad norm: 24249.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8420/ 159576 | consumed samples: 435760 | elapsed time per iteration (ms): 22797.0 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.923830E+00 | loss scale: 1024.0 | grad norm: 16621.010 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8430/ 159576 | consumed samples: 437360 | elapsed time per iteration (ms): 23019.9 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.940806E+00 | loss scale: 1024.0 | grad norm: 15050.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8440/ 159576 | consumed samples: 438960 | elapsed time per iteration (ms): 23026.2 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.984757E+00 | loss scale: 1024.0 | grad norm: 22968.730 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8450/ 159576 | consumed samples: 440560 | elapsed time per iteration (ms): 22903.0 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.970832E+00 | loss scale: 1024.0 | grad norm: 25206.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8460/ 159576 | consumed samples: 442160 | elapsed time per iteration (ms): 22992.7 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.992513E+00 | loss scale: 1024.0 | grad norm: 9219.678 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8470/ 159576 | consumed samples: 443760 | elapsed time per iteration (ms): 23036.6 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.053975E+00 | loss scale: 1024.0 | grad norm: 9743.104 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8480/ 159576 | consumed samples: 445360 | elapsed time per iteration (ms): 22710.5 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.087634E+00 | loss scale: 1024.0 | grad norm: 36403.836 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8490/ 159576 | consumed samples: 446960 | elapsed time per iteration (ms): 22994.9 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.142048E+00 | loss scale: 1024.0 | grad norm: 8807.945 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8500/ 159576 | consumed samples: 448560 | elapsed time per iteration (ms): 22707.3 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.160313E+00 | loss scale: 1024.0 | grad norm: 9148.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8510/ 159576 | consumed samples: 450160 | elapsed time per iteration (ms): 22963.9 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.277474E+00 | loss scale: 1024.0 | grad norm: 43448.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8520/ 159576 | consumed samples: 451760 | elapsed time per iteration (ms): 19193.8 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 64.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8530/ 159576 | consumed samples: 453360 | elapsed time per iteration (ms): 15554.5 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8540/ 159576 | consumed samples: 454960 | elapsed time per iteration (ms): 15434.8 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8550/ 159576 | consumed samples: 456560 | elapsed time per iteration (ms): 15729.0 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-28 06:32:50] PULSE: tr8-104B is scheduled to start in 17:29:26 (at 2021-09-29T00:02:17) (1277218 on 'gpu_p13' partition) [2021-09-28 06:32:50] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1277295_[1-10%1] on 'gpu_p13' partition) [2021-09-28 06:32:50] PULSE: tr8-104B is running for 12:49:24 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6]) iteration 8560/ 159576 | consumed samples: 458160 | elapsed time per iteration (ms): 15526.6 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8570/ 159576 | consumed samples: 459760 | elapsed time per iteration (ms): 15343.9 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8580/ 159576 | consumed samples: 461360 | elapsed time per iteration (ms): 15516.0 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8590/ 159576 | consumed samples: 462960 | elapsed time per iteration (ms): 15788.5 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8600/ 159576 | consumed samples: 464560 | elapsed time per iteration (ms): 15421.5 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8610/ 159576 | consumed samples: 466160 | elapsed time per iteration (ms): 15365.4 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8620/ 159576 | consumed samples: 467760 | elapsed time per iteration (ms): 15460.6 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8630/ 159576 | consumed samples: 469360 | elapsed time per iteration (ms): 15794.2 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8640/ 159576 | consumed samples: 470960 | elapsed time per iteration (ms): 15928.5 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8650/ 159576 | consumed samples: 472560 | elapsed time per iteration (ms): 15514.8 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8660/ 159576 | consumed samples: 474320 | elapsed time per iteration (ms): 16639.1 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8670/ 159576 | consumed samples: 476080 | elapsed time per iteration (ms): 16569.6 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8680/ 159576 | consumed samples: 477840 | elapsed time per iteration (ms): 16695.6 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8690/ 159576 | consumed samples: 479600 | elapsed time per iteration (ms): 16700.3 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8700/ 159576 | consumed samples: 481360 | elapsed time per iteration (ms): 16569.3 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8710/ 159576 | consumed samples: 483120 | elapsed time per iteration (ms): 16526.6 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8720/ 159576 | consumed samples: 484880 | elapsed time per iteration (ms): 16370.8 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8730/ 159576 | consumed samples: 486640 | elapsed time per iteration (ms): 16678.1 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8740/ 159576 | consumed samples: 488400 | elapsed time per iteration (ms): 16715.4 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8750/ 159576 | consumed samples: 490160 | elapsed time per iteration (ms): 16605.2 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8760/ 159576 | consumed samples: 491920 | elapsed time per iteration (ms): 16522.8 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8770/ 159576 | consumed samples: 493680 | elapsed time per iteration (ms): 16607.3 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-28 07:32:48] PULSE: tr8-104B is scheduled to start in 17:38:05 (at 2021-09-29T01:10:54) (1277218 on 'gpu_p13' partition) [2021-09-28 07:32:48] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1277295_[1-10%1] on 'gpu_p13' partition) [2021-09-28 07:32:48] PULSE: tr8-104B is running for 13:49:22 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6]) iteration 8780/ 159576 | consumed samples: 495440 | elapsed time per iteration (ms): 16798.5 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8790/ 159576 | consumed samples: 497200 | elapsed time per iteration (ms): 16594.8 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 8800/ 159576 | consumed samples: 498960 | elapsed time per iteration (ms): 16863.3 | learning rate: 6.000E-05 | global batch size: 176 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) srun: Job step aborted: Waiting up to 62 seconds for job step to finish. Killing subprocess 30115 Killing subprocess 30116 Killing subprocess 72376 Killing subprocess 30117 Killing subprocess 72377 Killing subprocess 72378 Killing subprocess 30118 Main process received SIGTERM, exiting Killing subprocess 72380 Killing subprocess 14784 Killing subprocess 14785 Killing subprocess 13422 Killing subprocess 14786 Killing subprocess 55737 Killing subprocess 14788 Killing subprocess 70412 Main process received SIGTERM, exiting Killing subprocess 16940 Killing subprocess 72459 Killing subprocess 13423 Killing subprocess 74871 Killing subprocess 55738 Killing subprocess 29874 Killing subprocess 66501 Killing subprocess 16941 Killing subprocess 16942 Killing subprocess 16943 Killing subprocess 16970 Killing subprocess 70413 Killing subprocess 72867 Killing subprocess 13424 Killing subprocess 29875 Killing subprocess 13425 Main process received SIGTERM, exiting Killing subprocess 74872 Killing subprocess 13332 Killing subprocess 38577 Killing subprocess 60665 Killing subprocess 59238 Killing subprocess 59239 Killing subprocess 55739 Killing subprocess 71579 Killing subprocess 55740 Killing subprocess 13333 Killing subprocess 70414 Killing subprocess 72868 Killing subprocess 70416 Killing subprocess 33635 Killing subprocess 74873 Killing subprocess 16971 Killing subprocess 59240 Killing subprocess 29876 Killing subprocess 72869 Killing subprocess 4131 Killing subprocess 31723 Killing subprocess 29877 Killing subprocess 70249 Main process received SIGTERM, exiting Killing subprocess 71580 Killing subprocess 33197 Killing subprocess 33198 Killing subprocess 33199 Killing subprocess 16972 Killing subprocess 13334 Killing subprocess 37375 Killing subprocess 31519 Killing subprocess 60666 Killing subprocess 60928 Killing subprocess 5189 Killing subprocess 71748 Killing subprocess 60667 Killing subprocess 59241 Main process received SIGTERM, exiting Killing subprocess 52958 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 71581 Main process received SIGTERM, exiting Killing subprocess 76865 Killing subprocess 72870 Killing subprocess 4132 Killing subprocess 60668 Killing subprocess 31520 Main process received SIGTERM, exiting Killing subprocess 38578 Killing subprocess 74874 Killing subprocess 16973 Killing subprocess 76175 Main process received SIGTERM, exiting Killing subprocess 37376 Killing subprocess 60929 Main process received SIGTERM, exiting Killing subprocess 72460 Killing subprocess 52959 Killing subprocess 66400 Killing subprocess 33636 Killing subprocess 5190 Killing subprocess 76176 Killing subprocess 73489 Killing subprocess 72461 Killing subprocess 13335 Killing subprocess 38579 Killing subprocess 76866 Main process received SIGTERM, exiting Killing subprocess 6862 Killing subprocess 52960 Killing subprocess 38580 Killing subprocess 76177 Killing subprocess 31521 Killing subprocess 60930 Main process received SIGTERM, exiting Killing subprocess 33637 slurmstepd: error: *** STEP 1271196.0 ON r7i7n6 CANCELLED AT 2021-09-28T07:42:47 *** Killing subprocess 14888 Killing subprocess 71582 Killing subprocess 31522 Killing subprocess 72462 Killing subprocess 70250 Killing subprocess 33639 Killing subprocess 5191 Killing subprocess 76178 Killing subprocess 76867 Killing subprocess 73490 Killing subprocess 8322 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 5192 Killing subprocess 71749 Killing subprocess 66401 Killing subprocess 70251 Killing subprocess 31724 Killing subprocess 23140 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 76869 Killing subprocess 24195 Killing subprocess 3669 Killing subprocess 14889 Killing subprocess 6863 Killing subprocess 73491 Killing subprocess 4133 Killing subprocess 70253 Killing subprocess 31725 Killing subprocess 14890 Killing subprocess 52961 Killing subprocess 66402 Killing subprocess 57345 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 66403 Killing subprocess 79017 Killing subprocess 5022 Killing subprocess 26301 Killing subprocess 71750 Main process received SIGTERM, exiting Killing subprocess 23141 Killing subprocess 66502 Killing subprocess 2542 Killing subprocess 37377 Killing subprocess 32138 Killing subprocess 62368 Killing subprocess 4134 Killing subprocess 33200 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 79018 Killing subprocess 62803 Killing subprocess 62804 Killing subprocess 62805 Killing subprocess 42235 Killing subprocess 1224 Killing subprocess 31687 Killing subprocess 65257 Main process received SIGTERM, exiting Killing subprocess 54282 Killing subprocess 2543 Killing subprocess 79019 Killing subprocess 42236 Killing subprocess 42237 Killing subprocess 36949 Killing subprocess 62369 Killing subprocess 23142 Killing subprocess 66503 Killing subprocess 3670 Main process received SIGTERM, exiting Killing subprocess 2544 Killing subprocess 7298 Killing subprocess 37378 Killing subprocess 73492 Killing subprocess 42238 Killing subprocess 31688 Killing subprocess 31689 Killing subprocess 31690 Killing subprocess 66505 Killing subprocess 2546 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 26302 Killing subprocess 39557 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 78372 Killing subprocess 27460 Killing subprocess 62806 Killing subprocess 8323 Killing subprocess 24196 Killing subprocess 1225 Killing subprocess 23143 Killing subprocess 3671 Killing subprocess 54283 Killing subprocess 14892 Killing subprocess 7299 Killing subprocess 71751 Killing subprocess 5023 Killing subprocess 78860 Main process received SIGTERM, exiting Killing subprocess 24197 Killing subprocess 57346 Main process received SIGTERM, exiting Killing subprocess 7300 Main process received SIGTERM, exiting Killing subprocess 78861 Killing subprocess 32139 Main process received SIGTERM, exiting Killing subprocess 36950 Killing subprocess 1226 Killing subprocess 26303 Main process received SIGTERM, exiting Killing subprocess 54284 Killing subprocess 5024 Killing subprocess 57347 Killing subprocess 26304 Killing subprocess 57348 Main process received SIGTERM, exiting Killing subprocess 78373 Killing subprocess 27461 Killing subprocess 8324 Killing subprocess 24198 Killing subprocess 3672 Killing subprocess 78374 Killing subprocess 54286 Killing subprocess 78862 Killing subprocess 32140 Killing subprocess 8325 Main process received SIGTERM, exiting Killing subprocess 36951 Killing subprocess 1227 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 78375 Killing subprocess 32141 Main process received SIGTERM, exiting Killing subprocess 36952 Killing subprocess 7301 Killing subprocess 78863 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 31726 Main process received SIGTERM, exiting Killing subprocess 7871 Killing subprocess 62370 Killing subprocess 60931 Main process received SIGTERM, exiting Killing subprocess 79020 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 7872 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 65258 Main process received SIGTERM, exiting Killing subprocess 22589 Killing subprocess 62372 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 5025 Main process received SIGTERM, exiting Killing subprocess 33581 Killing subprocess 7873 Main process received SIGTERM, exiting Killing subprocess 66867 Main process received SIGTERM, exiting Killing subprocess 7875 Killing subprocess 65259 Killing subprocess 65260 Main process received SIGTERM, exiting Killing subprocess 22590 Killing subprocess 22591 Killing subprocess 66868 Killing subprocess 22592 Main process received SIGTERM, exiting Killing subprocess 33582 Killing subprocess 66869 Killing subprocess 33583 Killing subprocess 6864 Killing subprocess 27462 Main process received SIGTERM, exiting Killing subprocess 23047 Killing subprocess 6865 Killing subprocess 27463 Killing subprocess 66871 Main process received SIGTERM, exiting Killing subprocess 43155 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 33585 Main process received SIGTERM, exiting Killing subprocess 43156 Killing subprocess 43157 Killing subprocess 39558 Killing subprocess 23048 Killing subprocess 23049 Killing subprocess 23050 Killing subprocess 43159 Main process received SIGTERM, exiting Killing subprocess 39559 Main process received SIGTERM, exiting Killing subprocess 39560 Main process received SIGTERM, exiting [2021-09-28 08:32:52] PULSE: ***ALERT: tr8-104B is not RUNNING or SCHEDULED! Alert someone at Eng WG*** [2021-09-28 09:33:05] PULSE: ***ALERT: tr8-104B is not RUNNING or SCHEDULED! Alert someone at Eng WG*** [2021-09-28 10:33:03] PULSE: ***ALERT: tr8-104B is not RUNNING or SCHEDULED! Alert someone at Eng WG*** [2021-09-28 11:33:17] PULSE: ***ALERT: tr8-104B is not RUNNING or SCHEDULED! Alert someone at Eng WG*** Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8-104B/bigscience/tools/slurm-status.py", line 177, in main() File "/gpfswork/rech/six/commun/code/tr8-104B/bigscience/tools/slurm-status.py", line 172, in main send_email_alert_job_not_scheduled(args.job_name) File "/gpfswork/rech/six/commun/code/tr8-104B/bigscience/tools/slurm-status.py", line 61, in send_email_alert_job_not_scheduled send_email(subject, body) File "/gpfswork/rech/six/commun/code/tr8-104B/bigscience/tools/slurm-status.py", line 39, in send_email server = smtplib.SMTP("localhost") File "/gpfslocalsup/pub/anaconda-py3/2020.02/lib/python3.7/smtplib.py", line 251, in __init__ (code, msg) = self.connect(host, port) File "/gpfslocalsup/pub/anaconda-py3/2020.02/lib/python3.7/smtplib.py", line 336, in connect self.sock = self._get_socket(host, port, self.timeout) File "/gpfslocalsup/pub/anaconda-py3/2020.02/lib/python3.7/smtplib.py", line 307, in _get_socket self.source_address) File "/gpfslocalsup/pub/anaconda-py3/2020.02/lib/python3.7/socket.py", line 728, in create_connection raise err File "/gpfslocalsup/pub/anaconda-py3/2020.02/lib/python3.7/socket.py", line 716, in create_connection sock.connect(sa) ConnectionRefusedError: [Errno 111] Connection refused [2021-09-28 12:33:29] PULSE: ***ALERT: tr8-104B is not RUNNING or SCHEDULED! Alert someone at Eng WG*** [2021-09-28 13:33:44] PULSE: ***ALERT: tr8-104B is not RUNNING or SCHEDULED! Alert someone at Eng WG*** [2021-09-28 14:34:11] PULSE: ***ALERT: tr8-104B is not RUNNING or SCHEDULED! Alert someone at Eng WG*** [2021-09-28 15:33:54] PULSE: ***ALERT: tr8-104B is not RUNNING or SCHEDULED! Alert someone at Eng WG*** [2021-09-28 16:34:11] PULSE: ***ALERT: tr8-104B is not RUNNING or SCHEDULED! Alert someone at Eng WG*** Traceback (most recent call last): File "/gpfswork/rech/six/commun/code/tr8-104B/bigscience/tools/slurm-status.py", line 177, in main() File "/gpfswork/rech/six/commun/code/tr8-104B/bigscience/tools/slurm-status.py", line 172, in main send_email_alert_job_not_scheduled(args.job_name) File "/gpfswork/rech/six/commun/code/tr8-104B/bigscience/tools/slurm-status.py", line 61, in send_email_alert_job_not_scheduled send_email(subject, body) File "/gpfswork/rech/six/commun/code/tr8-104B/bigscience/tools/slurm-status.py", line 39, in send_email server = smtplib.SMTP("localhost") File "/gpfslocalsup/pub/anaconda-py3/2020.02/lib/python3.7/smtplib.py", line 251, in __init__ (code, msg) = self.connect(host, port) File "/gpfslocalsup/pub/anaconda-py3/2020.02/lib/python3.7/smtplib.py", line 336, in connect self.sock = self._get_socket(host, port, self.timeout) File "/gpfslocalsup/pub/anaconda-py3/2020.02/lib/python3.7/smtplib.py", line 307, in _get_socket self.source_address) File "/gpfslocalsup/pub/anaconda-py3/2020.02/lib/python3.7/socket.py", line 728, in create_connection raise err File "/gpfslocalsup/pub/anaconda-py3/2020.02/lib/python3.7/socket.py", line 716, in create_connection sock.connect(sa) ConnectionRefusedError: [Errno 111] Connection refused ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. ...................................................... [OKAY][OKAY] [OKAY] ninjaninja .................................... [OKAY] [OKAY] -------------------------------------------------- [OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name ---------------------------------------------------------------------------------------------------- op name op name................ installed................ .. compatibleinstalled --------------------------------------------------.. op nameop nameop name ................ ................ ................................ installed installedinstalled ....installed .. compatible compatible compatible.. compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- compatible-------------------------------------------------- cpu_adam ............... [YES] ......cpu_adam ...............[OKAY] -------------------------------------------------- [YES] ...... [OKAY] cpu_adam ...............cpu_adam [YES]cpu_adamcpu_adam ............... ...... [YES]............... [OKAY]............... ...... fused_adam ............. [NO] fused_adam....... [OKAY]............. [YES] [OKAY] [YES] ............ [OKAY][OKAY]fused_adam [NO] ....... fused_lamb[OKAY] ............. [NO] .......fused_lamb [OKAY] fused_adam............. ............. [NO][NO] fused_adam.............. .............[OKAY][OKAY] fused_adam ............. [NO] ....... [OKAY] [NO] ....................fused_lambfused_lamb [OKAY]............. sparse_attn ............sparse_attn [NO] ................... [NO][OKAY] [NO].............[NO] .......[NO]fused_lamb .......[OKAY]............. ....... [OKAY][NO] [OKAY] ....... transformer[OKAY] ....... [OKAY] ............ [NO] transformer....... ............[OKAY] fused_lamb ............. [NO] .......sparse_attn ............ sparse_attn[NO]sparse_attn[OKAY] ............................... [NO][OKAY][NO] [NO] ....... [OKAY]stochastic_transformer .............. [OKAY][OKAY] transformer . stochastic_transformer[NO] ....... .[OKAY] [NO] ....... [OKAY] ............transformer transformer[NO]sparse_attn ............ ............................... [OKAY][NO][NO] .......[NO].......stochastic_transformer [OKAY] [OKAY]........ [NO]stochastic_transformer [OKAY]stochastic_transformer....... .[OKAY] transformer. [NO] [NO]....... ............[OKAY]....... [NO][OKAY] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninjaninja ninja ...................................................... ..................[OKAY][OKAY][OKAY] [OKAY]------------------------------------------------------------------------------------------------------------------------------------------------------ ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. op nameop nameop name -------------------------------------------------- ................ ................ ................ installedop name installed installed .................. compatible.. ..compatibleinstalled -------------------------------------------------- compatible --------------------------------------------------.. --------------------------------------------------compatible -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... ..................[OKAY][OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. [OKAY]------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adamcpu_adam ..............................cpu_adam [YES] cpu_adam[YES] ............... ...... .....................[YES][OKAY] ......[YES][OKAY] [OKAY]...... -------------------------------------------------- JIT compiled ops requires ninja op nameop name--------------------------------------------------op name [OKAY]fused_adam ............. [NO] ....... [OKAY]fused_adamfused_adam ................................................ op name installed installedinstalled ...................... installedcompatiblecompatible compatible..---------------------------------------------------------------------------------------------------- compatible -------------------------------------------------- ..........................fused_adam fused_lamb [NO] [NO]............. ............. ....... .......[NO] [OKAY][NO] [OKAY] ....... --------------------------------------------------cpu_adam .......[OKAY]fused_lamb fused_lamb[OKAY]............. cpu_adam............... ...............[YES] cpu_adam......[YES] [OKAY]............... ...... [YES][OKAY] ...... [OKAY]cpu_adam .............[NO] fused_lamb[NO]....... ....................sparse_attn [OKAY] [OKAY]............[NO] fused_adam ............................ [NO] .......fused_adam [OKAY]............. [NO]....... ....... [OKAY][OKAY] [NO]fused_adam fused_lamb....... ..........................[OKAY][YES] sparse_attn ............transformer sparse_attn [NO] ............ ............ [NO] ....... sparse_attn[NO].......[OKAY] ...................[OKAY] [NO]fused_lamb [NO] ............. .................... [NO][OKAY][OKAY] .......[OKAY] [NO][OKAY] transformer transformer ................... [OKAY]stochastic_transformer[NO]............ [OKAY] ........[NO] transformer [OKAY][NO]....... fused_lamb ............. [NO] ....... [OKAY] ................... [OKAY]stochastic_transformer[OKAY] sparse_attn ............ [NO] fused_adamsparse_attn....... ............[OKAY] [NO] ............. ....... [OKAY] [NO] ........stochastic_transformer [NO][OKAY] ........ [OKAY][NO]stochastic_transformer ....... [OKAY]. transformersparse_attn ............transformer[NO] ............ [NO]............ .......[NO] [OKAY]....... [NO] ....... [OKAY] [OKAY] .......[NO] stochastic_transformer stochastic_transformer ....... . .[OKAY][OKAY][NO] transformer[NO]....... .......[OKAY]............ [OKAY] [NO]fused_lamb ....... [OKAY] ............. stochastic_transformer[NO] . .......[NO] ....... [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name-------------------------------------------------- op name................op nameop name ................................ installed................installed ....installedinstalled compatible..compatible.. compatible----------------------------------------------------------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adamcpu_adam ...............cpu_adam............... ..............................[YES][YES] ......[YES][YES]...... [OKAY]...... [OKAY]...... [OKAY] [OKAY] fused_adam .............fused_adam [NO]............. fused_adamfused_adam ....... [NO] .......................... [OKAY]....... [NO] [NO] [OKAY] ....... .......fused_lamb [OKAY]fused_lamb[OKAY] ............. .............[NO] fused_lamb [NO] ....... fused_lamb.............[OKAY]....... .............[OKAY] [NO] [NO] .............. [OKAY][OKAY] sparse_attn ............ sparse_attn[NO] ................... [NO][OKAY] .......sparse_attnsparse_attn [OKAY]............transformer............ ............[NO] [NO][NO] ..............transformer....... [OKAY]............[OKAY][OKAY] [NO] .......transformertransformer stochastic_transformer ............[OKAY] ............ .[NO][NO] stochastic_transformer[NO].............. ........[OKAY][OKAY] [NO][OKAY] ....... stochastic_transformerstochastic_transformer[OKAY] .. [NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY] [OKAY][OKAY] -------------------------------------------------- --------------------------------------------------op name ---------------------------------------------------------------------------------------------------- op name................ installedop name................op name ..installed ................................compatible ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name ..installedinstalled-------------------------------------------------- .... compatible compatible ---------------------------------------------------------------------------------------------------- compatiblecpu_adam ...............-------------------------------------------------- op name op name op name................................ installed................installed................ ....installed installed compatible compatible.. .. ----------------------------------------------------------------------------------------------------compatible compatible ---------------------------------------------------------------------------------------------------- [YES] cpu_adam...... cpu_adam ............... [OKAY] ............... [YES] [YES]...... ......[OKAY] cpu_adamcpu_adam .............................. [YES][YES]cpu_adam cpu_adam ...... ...... [OKAY]............... cpu_adam[OKAY] [YES][OKAY]............... ......[YES] [OKAY]...... [OKAY]fused_adam ............... fused_adam ............. [YES][NO] .............fused_adam fused_adam [OKAY][OKAY] .......................... ............. fused_adam[NO] .................... [NO][OKAY]fused_adam [NO][NO]fused_lamb ........................... [NO][OKAY][OKAY] ....... [OKAY]fused_lambfused_lamb ....... fused_adam [OKAY]............. fused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] ............. [NO] ............. fused_lamb[NO] .................... .......[NO] [NO] [OKAY].......[OKAY]....... fused_adam ............. [NO]sparse_attn ............ [NO] .............. sparse_attn [OKAY]sparse_attn[OKAY] ........................ [OKAY][OKAY] fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] [NO][NO]transformer .......................... [OKAY][NO][OKAY] fused_lamb sparse_attnsparse_attn ........................ [NO][NO] ....... .......[OKAY] sparse_attn [OKAY]sparse_attn ....... transformertransformer ............. [OKAY] ............ [NO]............ [NO].......[NO] stochastic_transformer ....... ....... .[OKAY] [OKAY] [OKAY][NO] transformer............ ............ ............transformer[NO] [NO] ............ [NO] .............. [NO][OKAY] .......[OKAY] .......[OKAY] .......stochastic_transformer [OKAY]stochastic_transformer stochastic_transformer[OKAY]transformer . [NO]. .......[NO] [OKAY]....... [OKAY] ............. transformer [NO]stochastic_transformer[NO] ........................... [NO] [OKAY] [OKAY] [NO]....... ....... stochastic_transformer [OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] . [NO]stochastic_transformer ....... [OKAY]. [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninjacpu_adam ................................. [OKAY][YES] ...... --------------------------------------------------[OKAY] op name ................ installed .. compatible -------------------------------------------------- fused_adam ............. [NO] ....... [OKAY] cpu_adam ...............fused_lamb [YES]............. ......[NO] .......[OKAY] [OKAY] fused_adam ............. [NO] .......sparse_attn [OKAY]............ [NO] .......fused_lamb [OKAY]............. [NO] .......transformer [OKAY]............ [NO] ....... [OKAY] stochastic_transformer . sparse_attn[NO] ................... [NO][OKAY] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY] [OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------op nameop name ................ op name ................ installedop name ................ installed ..................installed compatible..installed.. compatiblecompatible--------------------------------------------------.. ----------------------------------------------------------------------------------------------------compatible -------------------------------------------------- cpu_adam ............... [YES]cpu_adam ......cpu_adam cpu_adam............... ...............[OKAY]...............[YES] [YES][YES] ...... ............[OKAY] [OKAY][OKAY]fused_adam ............. [NO] ....... [OKAY] fused_adam fused_adam.............fused_lambfused_adam ............. .............[NO] ............. [NO] [NO] .............. .......[OKAY][NO] [OKAY][OKAY]....... fused_lamb [OKAY].............fused_lamb [NO]fused_lamb............. .......[NO] .............sparse_attn [OKAY] .......[NO]............ [OKAY][NO]....... .......[OKAY] [OKAY] sparse_attn ............ [NO] transformer....... ............[OKAY]sparse_attn [NO]sparse_attn............ transformer ....... ............[NO] ............ [OKAY] [NO].......[NO] .......stochastic_transformer.......[OKAY] [OKAY].transformer[OKAY] [NO] ...................transformer stochastic_transformer ............[NO] [OKAY] .[NO]....... [NO].......[OKAY] .......[OKAY] [OKAY]stochastic_transformer stochastic_transformer . .[NO] [NO]....... .......[OKAY] [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name op name ................op name ................ ................ installedinstalled................ installed..installed ....compatible .. compatible-------------------------------------------------- compatible compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adamcpu_adam cpu_adam............... ..............................[YES] [YES][YES]...... ............... ......[OKAY]...... [OKAY][OKAY] [YES] ...... [OKAY] fused_adam .............fused_adamfused_adam .............[NO]............. .......[NO][NO] [OKAY]....... ....... [OKAY][OKAY] fused_lamb ............. [NO]fused_lambfused_lamb fused_adam ....... ....................................... [OKAY] [NO][NO][NO] ....... .............. [OKAY][OKAY] [OKAY] sparse_attn ............ fused_lamb[NO] .................... [NO][OKAY] .......sparse_attnsparse_attn [OKAY]transformer ........................ ............[NO][NO] [NO].............. ....... [OKAY][OKAY][OKAY] transformertransformer ............stochastic_transformer............ [NO][NO]. sparse_attn ....... ....... [NO][OKAY]............ [OKAY] ....... [NO] [OKAY]stochastic_transformer stochastic_transformer . ........[NO] [NO][OKAY]....... .......[OKAY] [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................. .................. [OKAY][OKAY][OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name op name ................op name op name................ installed ................ ..installed................ compatible..installedinstalled -------------------------------------------------- compatible.. .. --------------------------------------------------compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES]cpu_adam ..................... cpu_adam[YES][OKAY]cpu_adam .................................... [OKAY][YES][YES] ............ fused_adam[OKAY][OKAY] ............. [NO] ....... fused_adam[OKAY] ............. [NO]fused_lamb fused_adam....................fused_adam [OKAY][NO].......................... [NO].......fused_lamb [OKAY][NO]............. ....... [NO].......[OKAY] [OKAY]....... [OKAY]fused_lamb sparse_attnfused_lamb ...................................... [NO][NO] [NO] ....... ....... .......sparse_attn[OKAY] ............[OKAY][OKAY] [NO] ....... [OKAY]transformer ............ [NO] transformer....... sparse_attn............[OKAY] sparse_attn............[NO] stochastic_transformer[NO]............ ....... .[NO].......[OKAY] .......[NO][OKAY] .......stochastic_transformer [OKAY] [OKAY] transformer. transformer............[NO] ............ [NO].......[NO] [OKAY].............. [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name ................................ installed................ ................ installedinstalled .. installed.. ..compatiblecompatible.. compatible---------------------------------------------------------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam ..............................cpu_adam cpu_adam [YES] [YES] ............... ........................... [YES][OKAY][YES] [OKAY] ............ [OKAY][OKAY] fused_adam fused_adam.............fused_adam fused_adam[NO] ............. .......................... ....... [NO][NO][NO] [OKAY] .............. ....... [OKAY]fused_lamb[OKAY] [OKAY] ............. fused_lambfused_lamb[NO] .......fused_lamb.......................... [OKAY][NO].............[NO] ..............[NO] [OKAY][OKAY]....... [OKAY] sparse_attn ............ [NO]sparse_attn sparse_attn....... sparse_attn ............ [OKAY] ........................[NO] [NO][NO]....... transformer ..........................[OKAY] [NO][OKAY][OKAY] transformer....... [OKAY]transformertransformer............ ........................[NO] [NO]stochastic_transformer .......[NO] ....... [OKAY]........ [OKAY] [NO][OKAY] .......stochastic_transformer stochastic_transformer [OKAY] stochastic_transformer . . [NO].[NO] .......[NO]....... [OKAY].......[OKAY] [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................. ..................[OKAY][OKAY][OKAY] --------------------------------------------------[OKAY] ---------------------------------------------------------------------------------------------------- op name--------------------------------------------------op name op name................ ................op name................installed ..installed................installed compatible..installed.. compatible..compatible-------------------------------------------------- -------------------------------------------------- --------------------------------------------------compatible -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam[YES]............... cpu_adam.....................[YES] ............... [YES] [OKAY] ............ [YES] [OKAY][OKAY]...... [OKAY] fused_adam ............. [NO] ....... [OKAY]fused_adam fused_adam.............fused_adam fused_lamb.............[NO] ..........................[NO] ....... [NO].......[OKAY] [NO] .......[OKAY] fused_lamb[OKAY]....... .............[OKAY] fused_lamb[NO] .............fused_lamb....... [NO]............. [OKAY]sparse_attn....... [OKAY][NO]............ .......[NO] [OKAY]....... [OKAY] transformersparse_attn ........................ [NO]sparse_attn[NO] ....... .......sparse_attn ............ [OKAY][NO] [OKAY]............ .......[NO] transformer[OKAY] stochastic_transformer ................... [NO][OKAY]. .......transformer [NO] [OKAY] transformer............ ....... ............ [NO] [NO]stochastic_transformer.......[OKAY] [OKAY]....... . [OKAY][NO] .......stochastic_transformer stochastic_transformer [OKAY] .. [NO][NO] ....... .......[OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inferencetransformer_inference .. [NO] ......... [NO][OKAY] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ......quantizer [OKAY].............. [NO] ....... quantizer[OKAY] .............. [NO] .......-------------------------------------------------- [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.async_io ............... [NO] ....... [NO] transformer_inference .. [NO] .......async_io [OKAY]............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] transformer_inferencequantizer ................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... async_io[NO] ............... [NO] ....... [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... [OKAY]quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ......  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[OKAY] quantizer .............. [NO] ....... [OKAY]async_io ............... [NO] --------------------------------------------------....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] utils .................. [YES] ...... [OKAY] transformer_inference transformer_inference.. ..[NO] .......[NO] [OKAY]....... quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- [OKAY] utils ..................utils [YES].................. ...... [YES][OKAY] ...... [OKAY]quantizer .............. [NO] quantizer....... [OKAY].............. [NO] ....... --------------------------------------------------[OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] async_io ...............transformer_inference [NO].. .......[NO]transformer_inference [NO]....... .. [OKAY] [NO] ....... [OKAY] utils transformer_inference.................. ..[YES] [NO]......utils [OKAY]....... ..................[OKAY] quantizer[YES] .................... [NO]utils[OKAY] .................. ....... [YES][OKAY] quantizer...... [OKAY]--------------------------------------------------.............. [NO] ....... [OKAY]quantizer .............. [NO]-------------------------------------------------- ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] .......[OKAY] [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer ..............quantizer .............. [NO] ....... [NO][OKAY] ....... -------------------------------------------------- [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................. [YES] ...... [OKAY] quantizer .............. [NO] ......................... [OKAY][YES] -------------------------------------------------- ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.async_io ............... [NO] ....... [NO] transformer_inference .. [NO]async_io ....... ...............[OKAY] [NO] ....... [NO] utils .................. [YES] ...... [OKAY] transformer_inferencequantizer ................ [NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja async_io ............... [NO] ....... async_io[NO] ............... [NO] ....... [NO] -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja [YES] ...... [OKAY]quantizer .............. [NO] ....... [OKAY]quantizer .............. [NO] --------------------------------------------------....... [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................................................................ installedinstalled installed installed .... .. compatible..compatible -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja compatible--------------------------------------------------compatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja JIT compiled ops requires ninja --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at cpu_adamcpu_adam cpu_adam..............................cpu_adam [YES] ............... [YES]..................... [YES]......[OKAY][YES] [OKAY]............ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- [OKAY][OKAY] JIT compiled ops requires ninja fused_adam ............. [NO]fused_adam .......fused_adamfused_adam............. [OKAY][NO].......................... .......fused_lamb [NO] [NO][OKAY]............. ..............[NO] fused_lamb.......[OKAY][OKAY] ............. [OKAY][NO] fused_lambfused_lamb ................................. [OKAY][NO][NO] .............. [OKAY][OKAY]sparse_attn ............ [NO] ....... [OKAY]sparse_attn ............ [NO] .......transformersparse_attn sparse_attn............ [OKAY] ............ [NO]............ [NO].......[NO]transformer .......................... [OKAY] [OKAY][NO] [OKAY] ....... [OKAY]stochastic_transformer transformertransformer . stochastic_transformer ........................ [NO] .[NO][NO]....... .......[NO][OKAY]....... [OKAY] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [OKAY]stochastic_transformer async_io ............... [NO] ....... [NO] stochastic_transformer. [NO]. .......[NO] [OKAY]....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name --------------------------------------------------op name op name ................ ................................installedop name installed..installed................ compatibleinstalled.... ..--------------------------------------------------compatiblecompatible compatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam[YES]cpu_adam............... ......[YES].............................. ......[YES][YES] [OKAY]...... ...... [OKAY][OKAY] [OKAY] fused_adam ............. [NO] fused_adamfused_adam....... .............[OKAY]............. [NO][NO] .......fused_lamb ....... [OKAY] ............. [OKAY] [NO]fused_adam .......fused_lamb............. fused_lamb [OKAY] .......................... [NO][NO][NO] .............. [OKAY][OKAY]....... [OKAY]sparse_attn ............ [NO] ....... [OKAY] fused_lambtransformer ............sparse_attnsparse_attn ..................................... [NO][NO][NO][NO] ..................... [OKAY] [OKAY][OKAY] transformer....... transformer ........................ stochastic_transformer[NO] [NO][OKAY] . ....... ....... [NO] [OKAY][OKAY]....... [OKAY] stochastic_transformerstochastic_transformer .. sparse_attn [NO] [NO] .............. [OKAY]............[OKAY] [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop name op name op name................ ................ ................installed ................ installedinstalled ..installed .. .. compatible.. compatiblecompatible ----------------------------------------------------------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam...............cpu_adam ...............cpu_adam...............[YES] [YES] [YES]........................... [OKAY]......[OKAY] [YES] [OKAY] ...... [OKAY] fused_adam fused_adam............. fused_adam ............. [NO]fused_adam............. .......[NO][NO]............. .......[OKAY].......[NO] [OKAY][OKAY] fused_lamb....... .............[OKAY]fused_lamb [NO]fused_lamb............. fused_lamb....... [NO].............[OKAY]............. .......[NO][NO] [OKAY].............. [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY]sparse_attn sparse_attn sparse_attn........................transformer [NO]............[NO]............ ....... [NO] .......[NO][OKAY] [OKAY] ....... ....... [OKAY][OKAY]transformer transformer............ transformerstochastic_transformer[NO]............ ....................[NO] [OKAY] [NO][NO] ....... ..............[OKAY] [OKAY]stochastic_transformer[OKAY] .stochastic_transformer [NO] stochastic_transformer....... . [OKAY] . [NO] [NO]....... .......[OKAY] [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name op name................ op name................ ................installedinstalled................ installed .... installedcompatiblecompatible.. ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY] [OKAY][OKAY] --------------------------------------------------..-------------------------------------------------- compatiblecompatible ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name cpu_adamcpu_adam ...............cpu_adam............... [YES]cpu_adam...............[YES] ...... ..................... [YES] [OKAY] [OKAY] [YES]...... ......[OKAY] [OKAY] op nameop name................op name ................................................ installed installed installed installed.... compatible....compatible compatible----------------------------------------------------------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- fused_adam ............. fused_adam[NO] fused_adam.................... fused_adam [NO][OKAY] cpu_adamcpu_adam ...............cpu_adam...............cpu_adam [YES][YES].............................. ...... [YES] ......[YES][OKAY] .................... .............[NO]fused_lamb[OKAY] ......[OKAY]...... [OKAY][OKAY] [NO] ............. ....... .......fused_lamb[NO] [OKAY][OKAY] ............. .......[NO] [OKAY]....... fused_adam ............. [NO]fused_adam .................... fused_adam [OKAY]fused_adam fused_lambfused_lamb [OKAY].......................... [NO][NO] .............. [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [NO] ............. .............fused_lamb ....... [NO] [NO]............. [OKAY] [NO]....... sparse_attn ............ [NO] sparse_attn....... ............[OKAY] async_io ............... [NO] ....... [NO] ....... ....... fused_lamb[OKAY][OKAY][OKAY] ............. [NO] sparse_attn.......sparse_attntransformer ............[OKAY]........................ transformer_inference .. [NO] ....... [OKAY] [NO]fused_lamb ....... fused_lamb ............. [OKAY] ............. [NO] [NO][NO] transformer ................................. [OKAY][OKAY][OKAY][NO] [NO] [NO]....... sparse_attn....... ............[OKAY][OKAY] ....... stochastic_transformertransformertransformer [OKAY]............ . utils .................. [YES] ...... [OKAY] [NO]sparse_attn ................... [OKAY][NO] ....... [OKAY] ............[NO][NO] [NO]stochastic_transformer.............. .......[OKAY].[OKAY] [OKAY][NO] quantizer .............. [NO] ....... [OKAY] transformer ............transformer [NO]sparse_attn sparse_attn................... [NO]............[OKAY] ............ ....... stochastic_transformer[OKAY]stochastic_transformer . .[NO] [NO]....... .......[OKAY] -------------------------------------------------- ....... [NO] [NO] stochastic_transformer [OKAY]....... ....... . [OKAY] [OKAY] [NO]stochastic_transformer [OKAY] .......transformer .[OKAY] transformer[NO]............ ...................[NO] [OKAY][NO]....... .......[OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY][OKAY]-------------------------------------------------- --------------------------------------------------op name ----------------------------------------------------------------------------------------------------................op name op name................installedop name installed.................. ................ ..installed installedcompatible compatible .. --------------------------------------------------.. -------------------------------------------------- compatible compatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... cpu_adam[YES] cpu_adamcpu_adam ............... .....................[YES]............... [OKAY] ......[YES] [YES] [OKAY]............ [OKAY][OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY] fused_adam.............fused_adamfused_lamb [NO]....................................... .......[NO][NO][NO] [OKAY]....... ..............[OKAY] [OKAY] [OKAY]fused_lamb ............. [NO]fused_lambfused_lamb ....... ............. .............[OKAY][NO]sparse_attn [NO]................... .......[NO][OKAY] .......[OKAY] [OKAY] sparse_attn ............ transformer[NO] ................... [NO][OKAY] sparse_attn.......sparse_attn ............[OKAY]transformer............ ............[NO][NO] .......stochastic_transformer[NO] .......[OKAY]....... [OKAY].[OKAY] transformer[NO] ....... stochastic_transformertransformer............[OKAY] .............[NO] [NO][NO]....... ....... ....... [OKAY] [OKAY] [OKAY] stochastic_transformer stochastic_transformer . .[NO] [NO]....... .......[OKAY] [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja--------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- ---------------------------------------------------------------------------------------------------- meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name -------------------------------------------------- op name................op name op name................installed ................................ installed ..installedinstalled ....compatible.. compatible compatible --------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. cpu_adam ............... cpu_adam[YES]cpu_adam cpu_adam..................... ...............[YES]............... [OKAY] ...... [YES] [YES] [OKAY] ...... ...... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- fused_adam ............. [NO] fused_adam....... .............[OKAY] JIT compiled ops requires ninja fused_adam[NO]fused_adam fused_lamb................................. .............[NO][NO][OKAY] [NO].............. .......fused_lamb [OKAY][OKAY][OKAY] ............. [NO]fused_lamb fused_lamb ....... ............. ............. [OKAY] [NO] [NO] ..............sparse_attn [OKAY][OKAY]............ [NO] ....... [OKAY]sparse_attn ............ transformer[NO] ................... [NO][OKAY] ....... sparse_attnsparse_attn[OKAY]transformer ............ ............ ............ stochastic_transformer[NO][NO] [NO]........ [OKAY] ....... [NO] ....... [OKAY] .......transformer[OKAY] [OKAY]............ transformer [NO]stochastic_transformer............ .......[NO] . [OKAY] ....... [NO] [OKAY].......stochastic_transformer [OKAY] . stochastic_transformer[NO] ....... .[OKAY] [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------op nameop name op name................................op name installed ................installed .................. installed..compatible ..installed--------------------------------------------------compatible ..compatible-------------------------------------------------- compatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES] ......cpu_adam cpu_adamcpu_adam ............................................. [YES][YES][YES] ............[OKAY]...... [OKAY][OKAY] [OKAY] fused_adamfused_adam .......................... fused_adam [NO] [NO].................... fused_adam[NO] ....... [OKAY].................... [OKAY][OKAY] [NO]fused_lamb fused_lambfused_lamb....... ............. .......................... [OKAY] [NO][NO][NO] fused_lamb ..................... [OKAY] [OKAY][OKAY]............. [NO] ....... [OKAY] sparse_attn sparse_attn............sparse_attn ............[NO]............ [NO][NO]....... ....... .......[OKAY] sparse_attn [OKAY][OKAY] transformertransformer............ transformer ........................ [NO] ............[NO] [NO].......[NO] .............. [OKAY] [OKAY] .......[OKAY] [OKAY]stochastic_transformer transformerstochastic_transformer . stochastic_transformer............[NO]. .......[NO] .[NO]....... [NO][OKAY] ....... ....... [OKAY][OKAY][OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninjaninjaninja .................................... .................. [OKAY].................. [OKAY] [OKAY] [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] op name................op nameop name installed................................ ................ installed..installed compatible..installed.. compatible--------------------------------------------------..compatible compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name ninjaninja .................................... [OKAY][OKAY] cpu_adam ............... [YES] ......cpu_adam cpu_adam cpu_adam[OKAY].............................. ............... ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY][OKAY] -------------------------------------------------- ................op nameop name................ installedinstalled................................ ..installed..installed compatible .. compatible.. -------------------------------------------------- --------------------------------------------------compatible compatible ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- [YES] [YES] [YES] ...... ............[OKAY] [OKAY][OKAY] fused_adam ------------------------------------------------------------------------------------------------------------------------------------------------------op name cpu_adam cpu_adam............... cpu_adam...............[YES]cpu_adam [YES]...... [OKAY]..................... ............... [YES][OKAY][YES] ............ [OKAY][OKAY] op nameop name ................................ installedinstalled .. ..compatible ............. [NO] ....... [OKAY] ................op nameop nameop name ................installed................................ installed..installedinstalled .... compatible.. compatible compatible compatible-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ fused_adam ............. [NO] fused_adam....... .............[OKAY] compatible-------------------------------------------------- fused_adam fused_adam.............fused_lambfused_adam ............. .............[NO] ............. [NO] .......[NO]....... [NO] [OKAY][OKAY].............. [OKAY][OKAY] cpu_adam cpu_adamcpu_adamcpu_adam............... .............................................[YES] [YES][YES]...... [YES] ...... ......[OKAY] ......[OKAY][OKAY] [OKAY] [NO]fused_adamfused_adam ....... fused_lamb .............[OKAY]............. -------------------------------------------------- fused_lamb .............fused_lamb fused_lamb [NO] ............. .................... sparse_attn[NO][NO] [OKAY] .......................... [NO][OKAY][OKAY] ....... [OKAY] fused_adam ............. [NO] fused_adamfused_adam.......fused_adam .......................... [OKAY] .............[NO] .............[NO][NO] [NO]fused_lamb.............. ....... .............[OKAY] [OKAY][OKAY][NO] fused_lamb....... .............fused_lamb[OKAY] cpu_adam ...............cpu_adam [YES] ..................... [YES][OKAY] transformer ............ [NO]sparse_attn sparse_attn...................sparse_attn [NO][OKAY]........................ [NO] [NO]....... .......fused_lamb....... [OKAY][OKAY]............. [OKAY] [NO]fused_lamb .......fused_lamb............. fused_lamb [OKAY]............. [NO]............. .......[NO] [OKAY]....... sparse_attn[OKAY] ............ ...... [OKAY] .......[NO][NO] stochastic_transformer [OKAY] ....... [NO] ............. [NO] .......[NO] .......[OKAY]....... [OKAY][OKAY] sparse_attn [NO] ....... sparse_attn[OKAY] fused_adam fused_adam............. [NO]............. .......[NO] [OKAY] ....... . [OKAY][OKAY][NO]transformer ............ [NO] ....... [OKAY] ............ [NO]transformer .......sparse_attn............ [OKAY] ....... [OKAY] ...................transformer transformer[NO][OKAY] ............ ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name sparse_attntransformer ............sparse_attn............sparse_attn [NO] [NO]............ ................... ....... [OKAY] [NO][NO][OKAY] sparse_attn[NO]............ transformer.......[NO] ............[OKAY] ............ ....... [NO][NO][OKAY] stochastic_transformer fused_lamb ............. fused_lamb[NO] .................... [OKAY] ................... [NO][NO][OKAY] .............. [OKAY] [OKAY]stochastic_transformer op name op name op name ................................................ installedinstalled................ installed .. .. installed ..compatible compatible .. --------------------------------------------------compatible --------------------------------------------------compatible .............. [OKAY][OKAY] transformerstochastic_transformer .............. transformer [OKAY] [OKAY] ............. [NO] ....... [OKAY] .stochastic_transformerstochastic_transformer [NO] ......... [NO][OKAY][NO] -------------------------------------------------- -------------------------------------------------- ............ .[NO]transformer transformer [NO]....... ........................ ....... [NO][OKAY][NO] [OKAY].............. [NO]stochastic_transformer [NO]transformer ....... ....................[OKAY] [NO] [NO] [OKAY] ....... sparse_attn sparse_attn............ ............[NO] .......[NO] [OKAY]....... [OKAY] .............. [OKAY][OKAY] cpu_adam ............... [YES] cpu_adam...... cpu_adam ...............cpu_adam [OKAY] stochastic_transformer [OKAY][OKAY] . [NO] .......stochastic_transformer stochastic_transformer[OKAY] ....... [OKAY][OKAY] stochastic_transformer transformer ............ [NO]transformer ....... ............[OKAY] [NO] ....... [OKAY] ...............[YES]............... [YES]......[YES] [OKAY]............ [OKAY][OKAY]fused_adam .. [NO][NO] .............. [OKAY][OKAY] . [NO]stochastic_transformer ....... .[OKAY] [NO] ....... [OKAY] stochastic_transformer stochastic_transformer. [NO] ........ [NO][OKAY] ............. [NO] ....... [OKAY] ....... [OKAY] fused_adam .............fused_adam fused_adamfused_lamb [NO] ............. ................................. [NO] [NO][OKAY][NO] .............. .......[OKAY]fused_lamb[OKAY] .............[OKAY] [NO]fused_lamb .......fused_lamb............. [OKAY].............[NO] [NO]sparse_attn....... ...................[OKAY] [NO][OKAY] ....... [OKAY]sparse_attn ............ [NO] transformer....... ............sparse_attn [OKAY][NO] ...................sparse_attn transformer[OKAY][NO] ............ ............ ....... [NO] [NO]stochastic_transformer [OKAY] .............. .[OKAY] [OKAY] [NO]transformer ................... stochastic_transformer transformer [OKAY][NO] . ............ .......[NO] [OKAY].......[NO] [OKAY]....... stochastic_transformer [OKAY] . [NO] .......stochastic_transformer [OKAY] . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`................ [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] utils....... ..................[NO] [YES] ...... [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [OKAY] quantizer .............. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] -------------------------------------------------- async_io ............... [NO]utils ......................... [NO][YES] ...... [OKAY] quantizer .............. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja .................. [OKAY] --------------------------------------------------cpu_adam ...............op name [YES]................ installed...... .. [OKAY]compatible -------------------------------------------------- fused_adam .............cpu_adam [NO]............... .......[YES] ......[OKAY] [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] sparse_attnfused_lamb ......................... [NO][NO] .............. [OKAY] [OKAY] transformer ............ [NO] ....... [OKAY] sparse_attn ............ stochastic_transformer[NO] ....... .[OKAY] [NO] .......transformer ............[OKAY] [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. stochastic_transformer . [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ...................... [NO][NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES]utils ........................ [OKAY][YES] ...... quantizer[OKAY] .............. [NO] quantizer....... ..............[OKAY] [NO]async_io ....... ...............--------------------------------------------------[OKAY] [NO] ....... [NO]-------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja .................................... [OKAY]..................[OKAY].................. [OKAY][OKAY]-------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name--------------------------------------------------op name op name................op name ................................ ................ installedinstalled installed installed.... compatible .... compatible compatible-------------------------------------------------- compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES]cpu_adamcpu_adam ...... ............... ............... ...............[OKAY][YES] [YES] ......[YES] ...... [OKAY] ...... [OKAY] [OKAY] fused_adam ............. [NO] ....... [OKAY]fused_adam fused_adam fused_adam fused_lamb ............. ....................................... [NO][NO][NO][NO] ....... .............. ....... [OKAY] [OKAY][OKAY] [OKAY] fused_lambfused_lambfused_lamb ....................................... [NO][NO][NO]sparse_attn .......................... ....... [NO][OKAY][OKAY] .......[OKAY] [OKAY] transformer ............ [NO] .......sparse_attn sparse_attn[OKAY] ............sparse_attn............ [NO]............[NO]stochastic_transformer ..............[NO] . [OKAY] [NO][OKAY] ....... ....... transformer [OKAY][OKAY]transformer ........................transformer [NO][NO]............ .......[NO]....... [OKAY]....... [OKAY] [OKAY] stochastic_transformer stochastic_transformer stochastic_transformer. . [NO].[NO] .......[NO]....... [OKAY][OKAY]....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name op name................ op name ................ ................installed ................installed installed ..installed .... compatible .. compatiblecompatible -------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adamcpu_adam [YES] ................................................... [YES][YES][YES][OKAY] .................. [OKAY][OKAY][OKAY] fused_adam ............. [NO]fused_adamfused_adamfused_adam .............................................. [OKAY] [NO][NO] [NO] ....... ....... ....... [OKAY]fused_lamb[OKAY] [OKAY].............fused_lamb fused_lamb [NO] fused_lamb ............. .................... ............. [NO][OKAY] [NO] .......[NO] ..............[OKAY] [OKAY][OKAY] sparse_attn ............ sparse_attnsparse_attn[NO]sparse_attn ............ ............................... [OKAY] [NO] [NO][NO] ..................... transformer [OKAY] [OKAY][OKAY] ............ [NO] transformer.......transformertransformer ............ ............[OKAY] ............ [NO][NO][NO] ..................... stochastic_transformer [OKAY][OKAY] [OKAY] . [NO]stochastic_transformer stochastic_transformer stochastic_transformer....... .[OKAY]. . [NO] [NO] .......[NO] .......[OKAY]....... [OKAY][OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop name op nameop name ................ ................................installed................ installed..installedinstalled ..compatible.... compatiblecompatible-------------------------------------------------- compatible -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam[YES]cpu_adam .................................... ............... [YES] [OKAY] [YES][YES] --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system ...... ............[OKAY] [OKAY] [OKAY] meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja fused_adam ............. [NO] .......fused_adamfused_adam fused_adam[OKAY].......................... -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................................... ....................................[OKAY] [OKAY] [NO][NO]............. fused_lamb....... [NO][OKAY]....... ............. ....... [OKAY] [NO]fused_lamb[OKAY] ....... [OKAY][OKAY]-------------------------------------------------- -------------------------------------------------- .............fused_lamb [OKAY] [NO] .............fused_lamb .......[NO]............. .......[OKAY][NO] ----------------------------------------------------------------------------------------------------op name op name [OKAY]....... [OKAY] ................op nameop name................ installed................ ................ installed installed installed .. .. .... compatiblecompatiblecompatible compatible sparse_attn ............ sparse_attn[NO] sparse_attn....... ............sparse_attn ............ [OKAY] [NO] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- [NO]................... transformer[OKAY][NO] ....... ............ [OKAY] .......[NO]transformer [OKAY] ....... cpu_adamcpu_adam cpu_adam............... [YES] ...... [OKAY] ............transformer [OKAY][NO]............ transformer .......[NO]............ stochastic_transformer.......[OKAY] [NO] cpu_adam............... ...............[YES]............... fused_adam......[YES] [OKAY]...... [OKAY]........ stochastic_transformer[NO][OKAY] ........ stochastic_transformer [OKAY] [NO] ............. [YES] [OKAY] [NO]fused_adam...... ........stochastic_transformer [NO][OKAY] .................... [OKAY] [NO] [OKAY]....... ........ [OKAY][NO] ....... [OKAY] fused_adam fused_lamb[OKAY] .......................... [NO][NO] fused_lamb.............. .............[OKAY][OKAY]fused_adam [NO] .................... [OKAY] fused_lamb [NO]............. sparse_attn.......[NO] ................... sparse_attn [OKAY] [NO] ............ [OKAY].......[NO] .......[OKAY] [OKAY] sparse_attnfused_lambtransformertransformer ............ ..................................... [NO][NO][NO] .......[NO] ....... ....... [OKAY] [OKAY] [OKAY] ....... [OKAY]stochastic_transformerstochastic_transformer transformer . ............. [NO] [NO] [NO] ..................... [OKAY] [OKAY][OKAY] stochastic_transformer .sparse_attn [NO] ....... [OKAY] ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop name op name................op name ................................ ................installed installedinstalledinstalled.. .... ..compatiblecompatible compatible--------------------------------------------------compatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam cpu_adam...............cpu_adam ............... [YES] ............... ............... ......[YES] [YES] [YES] [OKAY]...... ...... ......[OKAY][OKAY] [OKAY] fused_adam .............fused_adam fused_adam[NO] ............. ............. ....... [NO]fused_adam[OKAY][NO] ........................... fused_lamb [OKAY] [OKAY].............[NO] fused_lamb[NO].......fused_lamb ....................[OKAY]............. [OKAY][NO][NO] .............. [OKAY][OKAY]fused_lamb ............. [NO] ....... [OKAY]sparse_attn ............ [NO]sparse_attn sparse_attn ....... ........................[OKAY] [NO][NO] .......transformer....... [OKAY]............ [NO]sparse_attn[OKAY] ................... transformer[OKAY]transformer[NO] ............ [NO]................... stochastic_transformer .......[NO][OKAY] .[OKAY] [NO] ....... ....... stochastic_transformertransformer [OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system ............. stochastic_transformer [NO] [NO] ............... [NO][OKAY][OKAY] ....... [OKAY] meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. stochastic_transformer . [NO] ....... [OKAY] JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaJIT compiled ops requires ninja--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaJIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninjaJIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................. ..................[OKAY] [OKAY] [OKAY][OKAY] meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------op name JIT compiled ops requires ninja op name op name................op name................ ................installed................installed installed....installed ..compatible..compatible compatiblecompatible-------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... cpu_adam...............cpu_adam [YES]............... [YES].....................[YES] ......[YES][OKAY]...... [OKAY]......[OKAY] [OKAY] fused_adam fused_adam............. fused_adam[NO]............. fused_adam.............[NO]....... [NO]....................[OKAY] .......[OKAY][NO] [OKAY]....... fused_lamb [OKAY]............. fused_lamb [NO]fused_lamb............. ....................[NO] fused_lamb [OKAY] [NO] ............. ....... ....... [NO] [OKAY] [OKAY] ....... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report sparse_attn ............ [NO]sparse_attn ................... sparse_attn[OKAY][NO] meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ................... sparse_attn [NO] transformer [OKAY]............ ...................[NO] [OKAY][NO]....... transformer.......[OKAY] ............[OKAY]transformer meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system [NO]transformer............ ...................[NO] stochastic_transformer [OKAY][NO] ....... .......[OKAY]. [OKAY] meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja [NO]stochastic_transformer ....... stochastic_transformerstochastic_transformer[OKAY]. [NO]. . ....... [NO] [NO] [OKAY] ....... ....... [OKAY][OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................. .................................... [OKAY] [OKAY] [OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------op nameop name ................................op nameop name installedinstalled ................................ .. ..installed installed compatiblecompatible.... ----------------------------------------------------------------------------------------------------compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adamcpu_adam [YES] ...............cpu_adam ............... ...... [YES]............... [YES] [OKAY]............ [YES][OKAY] [OKAY] ...... [OKAY] fused_adam ............. [NO] ....... fused_adam[OKAY]fused_adam .......................... fused_adam[NO][NO] .............fused_lamb.............. .............[OKAY][NO] [OKAY] [NO] fused_lamb.............. fused_lamb ............. [OKAY][OKAY] [NO]............. [NO]....... fused_lamb.......[OKAY] .............[OKAY] sparse_attn [NO]............ .......[NO] [OKAY]....... [OKAY] sparse_attntransformer sparse_attn ........................ ............[NO][NO] [NO].............. .......[OKAY]sparse_attn [OKAY] [OKAY] ............ transformer[NO]transformerstochastic_transformer ............................... . [NO][OKAY] [NO][NO] transformer....... ....... ................... [OKAY][OKAY] [NO][OKAY] stochastic_transformer....... [OKAY] .stochastic_transformer [NO] ........ [NO][OKAY] stochastic_transformer ....... [OKAY]. [NO] ....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop name ................op name................................ installed ................installed.. installed ..installed compatible .... compatible --------------------------------------------------compatible compatible ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adam ............... [YES]cpu_adamcpu_adam .....................cpu_adam ..............................[OKAY][YES] [YES][YES]...... ............ [OKAY] [OKAY] [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adamfused_adam fused_lambfused_adam ............. ....................................... [NO][NO] ....... .......[NO][NO] [OKAY][OKAY].............. fused_lamb[OKAY][OKAY] fused_lamb ............. fused_lamb.............[NO] [NO].................... .......[NO][OKAY] sparse_attn[OKAY] ....... ............ [OKAY][NO] ....... [OKAY] sparse_attnsparse_attn ............sparse_attn............ transformer............ [NO][NO]............[NO] ....... ..............[OKAY][NO] [OKAY] [OKAY]....... transformer[OKAY]transformer transformer........................ stochastic_transformer ............ [NO][NO] . .......[NO]....... [NO].......[OKAY][OKAY] .......[OKAY] stochastic_transformer [OKAY]stochastic_transformer stochastic_transformer. .[NO]. [NO].......[NO] .......[OKAY]....... [OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... [OKAY]..................[OKAY][OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] --------------------------------------------------[OKAY]-------------------------------------------------- --------------------------------------------------op name ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name -------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------op name --------------------------------------------------op nameop name................ op nameop name op name................ ................ ................................installed installed installedinstalled.... ....compatiblecompatible compatible-------------------------------------------------- -------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- installed................................op name ..installed................installed .. compatibleinstalled .. compatible .. compatible ---------------------------------------------------------------------------------------------------- compatible op name op name ................op name................ installed................installed................ .... installed installed compatiblecompatible.. ..----------------------------------------------------------------------------------------------------compatible compatible cpu_adam ...............cpu_adam cpu_adam[YES] cpu_adam ............................................. ...... [YES][YES][YES] [OKAY]...... ......[OKAY]...... [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam[YES] cpu_adam .............................. ..................... [YES][OKAY][YES] cpu_adamcpu_adam .............................. cpu_adamcpu_adam [YES] [YES] .................................... ...... [OKAY][YES][YES] [OKAY] fused_adam .............fused_adam [NO].............fused_adamfused_adam [NO]................................. ....... [OKAY] [NO][NO] ............ [YES] [OKAY] [OKAY]......fused_adam ............ [OKAY][OKAY] [OKAY] ..............fused_lambfused_lamb [OKAY]............. [OKAY].............[NO] [OKAY]............. [NO] ....... fused_adam[OKAY] fused_adam fused_adam............. .............[NO] [NO].......fused_adam fused_adam .......[OKAY] ............. ............. [OKAY] [NO] fused_lamb [NO]fused_lamb.................... ....... [NO]............. [OKAY][OKAY] ....... fused_adam............. fused_adamfused_lamb............. .............[NO] [NO][NO] ............. ..................... [OKAY][OKAY][NO][OKAY] fused_lamb[NO] fused_lamb........................... .............[OKAY][NO] [NO] [OKAY]....... [OKAY] fused_lamb....... [OKAY]fused_lamb............. [OKAY][NO]....... fused_lamb [OKAY] ....... .............[OKAY] fused_lamb [NO].............sparse_attnfused_lamb [NO]................................ [OKAY][NO]....... [NO] .................... [NO][OKAY] ....... [OKAY] sparse_attn sparse_attn............ ............[NO]sparse_attnsparse_attn [NO]............................... ....... [NO][NO][OKAY] [OKAY] ....... ....... [NO] [OKAY] .............. [OKAY][OKAY] sparse_attn ............ [NO] sparse_attn....... ............[OKAY] transformertransformer[OKAY] [OKAY] transformer sparse_attn............ sparse_attn ............ [NO] ............[NO]....... [NO]....... .......[OKAY][OKAY]sparse_attn ............[OKAY] stochastic_transformer [NO] sparse_attn....... transformer............sparse_attn[OKAY] ............[NO]............transformer [NO] .......[NO] ............ .......[OKAY].......[NO] ........................transformer [NO]transformer [NO]............ ....... .......[OKAY]............ [NO] [OKAY] [NO] transformer....... .transformer ........................[OKAY][NO] ....... [OKAY]....... [OKAY] transformer[OKAY] .......stochastic_transformer[NO] [OKAY] stochastic_transformer [NO][NO] transformer ....... [OKAY]....... [OKAY][OKAY]............ stochastic_transformer............ transformer .stochastic_transformer[NO] [NO] ............ ........ ....... [NO] [NO][OKAY] ........ .stochastic_transformer[NO] [OKAY] ........[NO] [OKAY][NO]....... stochastic_transformer.......[OKAY] ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY][OKAY][OKAY] [NO] .......stochastic_transformer stochastic_transformer [OKAY] [OKAY].............. stochastic_transformer[OKAY][OKAY] .[OKAY] [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name .. [NO][NO] stochastic_transformer....... .......[OKAY] [OKAY] . [NO] stochastic_transformer....... [OKAY]. [NO] ....... [OKAY] op name................ op name ................................installed .. ................ installed installedcompatible installed .. ..-------------------------------------------------- compatible.. compatible . [NO] ....... [OKAY] --------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adamcpu_adam cpu_adam ............................................. ...............[YES][YES][YES] [YES].................. ......[OKAY][OKAY] [OKAY][OKAY] fused_adam fused_adam............. fused_adamfused_adam.............[NO] ....... .............[NO]............. [OKAY] ....... [NO] [NO] [OKAY] fused_lamb .............. .............[OKAY]fused_lamb[OKAY] [NO]............. fused_lamb.......[NO]fused_lamb .............[OKAY].................... [NO] [OKAY][NO] .............. [OKAY] [OKAY] sparse_attn ............ [NO]sparse_attn ................... [OKAY][NO]sparse_attn ................... sparse_attn transformer[NO] [OKAY]........................ .......[NO]transformer [NO] .......[OKAY]............ [OKAY].......[NO] transformer....... stochastic_transformer[OKAY]............[OKAY] .[NO] [NO].......stochastic_transformer transformer....... [OKAY] [OKAY]............. stochastic_transformer[NO][NO] ............... [OKAY] [NO][OKAY] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name -------------------------------------------------- op name ................op name op name ................ installed ................ ................ .. installedinstalled installed ..compatible.. ..compatiblecompatible-------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES]cpu_adam cpu_adam ..................... cpu_adam[OKAY]............... -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- [YES] ...............[YES]...... ......[OKAY] [YES]fused_adam[OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system ................... [OKAY][NO] ....... [OKAY]fused_adam -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja fused_adam.............fused_lamb [NO].......................... .......[NO][NO]fused_adam ....................[OKAY]....... [OKAY] [OKAY] runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja [NO] .......fused_lambfused_lamb [OKAY].......................... [NO][NO] .............. [OKAY][OKAY]fused_lamb sparse_attn ......................... [NO][NO] .............. [OKAY][OKAY] transformersparse_attnsparse_attn ........................ ............ [NO] [NO] [NO] ....... ....... .......[OKAY] sparse_attn[OKAY][OKAY] ninjaninjaninjaninja .................................... ..................[OKAY].................. [OKAY] [OKAY] --------------------------------------------------[OKAY]-------------------------------------------------- --------------------------------------------------op name--------------------------------------------------op name stochastic_transformer ............transformer.transformer [NO] [NO]............ .......................... [OKAY][NO][OKAY][NO] ....... ....... [OKAY][OKAY] transformer stochastic_transformer............stochastic_transformer . . [NO] [NO] [NO] ....... ....... ....... [OKAY] [OKAY] ................................op name op nameinstalled ................................installed.. installed.. installed compatible compatible [OKAY] .. ..-------------------------------------------------- -------------------------------------------------- compatible compatible stochastic_transformer . [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- cpu_adam ............... cpu_adam[YES] ..................... [YES][OKAY] cpu_adamcpu_adam ...... ..............................[OKAY] [YES][YES] ............fused_adam [OKAY][OKAY]............. [NO] .......fused_adam [OKAY]............. [NO] .......fused_lamb [OKAY]............. fused_adam fused_adamfused_lamb [NO] .......................... ............. ....... [NO] [NO] [OKAY][NO] ....... ..............[OKAY] [OKAY][OKAY] fused_lambfused_lamb .............sparse_attn............. [NO]............[NO] sparse_attn.......[NO]....... ............ .......[OKAY] [OKAY] [NO][OKAY] ....... [OKAY] transformer ............transformer [NO]............ .......[NO] [OKAY]....... sparse_attnsparse_attn [OKAY] ninjaninjaninjaninja .................. .................. .................................... [OKAY] ........................stochastic_transformer [NO]stochastic_transformer.[NO] ............... [NO] [OKAY] [OKAY][NO]....... [OKAY][OKAY][OKAY]-------------------------------------------------- -------------------------------------------------- --------------------------------------------------op nameop name-------------------------------------------------- .......[OKAY]transformertransformer [OKAY]............ op name................................ op name................installedinstalled .................. ..installed installed compatible ..compatible.. --------------------------------------------------compatiblecompatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- ............ [NO][NO] .............. [OKAY][OKAY] cpu_adam ............... cpu_adam[YES] ...............cpu_adam...... cpu_adam [YES] [OKAY] ............... ..................... [OKAY][YES][YES] stochastic_transformer stochastic_transformer . .[NO] [NO]....... .......[OKAY] [OKAY] ............ [OKAY][OKAY] fused_adam ............. [NO]fused_adam .................... [OKAY][NO] ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY][OKAY] [OKAY] fused_adam.......fused_adam fused_lamb[OKAY] ............. -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name-------------------------------------------------- ............. ............. [NO] fused_lamb[NO][NO] ............. ....... ....... .......[NO][OKAY][OKAY] [OKAY] ....... fused_lamb[OKAY] fused_lamb............. .............[NO] [NO]....... .......[OKAY] sparse_attn[OKAY] op nameop name................op name ................................................installed installed..installedinstalled compatible.. .. ............ [NO] .......sparse_attn [OKAY]............ ..compatible --------------------------------------------------compatible-------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- [NO] .......transformer [OKAY]sparse_attn cpu_adam cpu_adamcpu_adam...............cpu_adam ............... ............... ............... [YES] [YES][YES] [YES]...... ...... ............ [OKAY] [OKAY] [OKAY] [OKAY] ............sparse_attn transformer ............[NO] ............ ................... [NO] [NO] [NO][OKAY] ....... .............. [OKAY][OKAY][OKAY]stochastic_transformer fused_adam fused_adam............. fused_adamfused_adam............. [NO].............[NO] ............. .............. [NO] [NO][OKAY] ....... [OKAY][OKAY]....... .stochastic_transformer transformertransformer [NO] . ............................... [OKAY][NO][NO] fused_lamb [OKAY]fused_lamb [NO] ....... ....... ....... [OKAY] [OKAY] [OKAY] ............. fused_lamb ............. fused_lamb[NO].............[NO] ....................[NO]....... [OKAY].......[NO] ninjaninjaninjaninja .................. .................................... ..................[OKAY][OKAY] [OKAY] [OKAY] .......[OKAY] [OKAY] stochastic_transformer stochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------op name op name op name................op name installed................................ ................ .. installedinstalledcompatibleinstalled sparse_attn ............ [NO]sparse_attn ....... sparse_attn[OKAY]............ ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY] [OKAY] --------------------------------------------------.. .. ..compatible compatible compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............[NO] transformersparse_attn.......[NO] ............ ............ [OKAY] .......[NO][NO] .............. [OKAY]transformer [OKAY][OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name ............... [YES] ...... [OKAY]cpu_adam ............ ...............cpu_adamcpu_adam [YES] ............... ............... ...... [YES][YES]fused_adam [OKAY] ................... ......[NO][OKAY] [OKAY]....... transformerstochastic_transformer[NO]transformer ................................ [OKAY][NO][NO][NO] ....... op nameop name................ op name ................ ................installed ................ installedinstalled .. installed.. .. compatible compatible..compatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- compatible [OKAY] .............. [OKAY][OKAY][OKAY] stochastic_transformer -------------------------------------------------- fused_adam .............fused_lamb [NO]............. fused_adam.......fused_adam [NO].............[OKAY] .stochastic_transformer [NO] .stochastic_transformer....... [NO][OKAY] ........ cpu_adam cpu_adam............... cpu_adam............... [YES]cpu_adam...............[YES] [YES] ...... ..................... ...... [OKAY] [YES][OKAY] ....................[NO] fused_lamb [NO] [OKAY]............. ....... .......[NO] [OKAY].......[OKAY] [OKAY] [NO] ....... [OKAY] [OKAY] ...... [OKAY] [OKAY] fused_adam fused_adam.............fused_adam [NO]fused_adam............. ............. ....... ............. [NO][NO] [OKAY]....... [NO] fused_lambfused_lamb .......................... sparse_attn [NO] [NO]............ ..............[NO] [OKAY]sparse_attn[OKAY]....... ....... [OKAY] .......[OKAY] fused_lamb [OKAY] ............ [OKAY][NO] ....... [OKAY] .............fused_lamb fused_lamb.............[NO] fused_lamb ............. ....................[NO][NO] [NO] [OKAY] ..................... [OKAY][OKAY][OKAY] transformer ............ transformer[NO] ................... sparse_attn [NO][OKAY] .......sparse_attn ............ ............ [OKAY] sparse_attn ............ [NO]sparse_attnsparse_attn sparse_attn ........................................... [NO][OKAY][NO][NO] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] [NO]stochastic_transformer[NO] stochastic_transformer ....... ........ [OKAY][NO]. .......[OKAY][NO] [OKAY] transformer....... ..................... transformer [OKAY][OKAY] [OKAY]............ transformer[OKAY]............ transformer[NO] transformertransformer................... ............[NO] ............[NO][OKAY]....... ....... [NO] [OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op nameop name................op name ................ ................................ installed installedinstalled installed ...... compatiblecompatible..compatible ----------------------------------------------------------------------------------------------------compatible-------------------------------------------------- ............[NO] [NO]....... .......[OKAY] [OKAY] stochastic_transformer....... [OKAY]stochastic_transformer -------------------------------------------------- stochastic_transformer stochastic_transformer. [NO]. .......[NO] [OKAY]....... .stochastic_transformer [NO].stochastic_transformer . .......[NO] .[NO] ....... [OKAY] [NO]....... [OKAY] .......[OKAY] cpu_adamcpu_adam cpu_adam ............... cpu_adam.............................. [YES][YES]...............[YES] ............[YES] [OKAY] ......[OKAY] ...... [OKAY] [OKAY] [OKAY] [OKAY] fused_adamfused_adam .......................... fused_adamfused_adam [NO][NO] ............. ....... ....... [NO]............. [OKAY] [OKAY][NO]....... [OKAY] fused_lamb .................... fused_lambfused_lamb[OKAY] [NO] ............. .............[NO]fused_lamb....... [NO][OKAY]....... ............. [OKAY] ....... [NO] [OKAY]....... [OKAY]sparse_attn ............ sparse_attn[NO] ................... [OKAY][NO] sparse_attn....... transformer............[OKAY] ............[NO] sparse_attn [NO].......transformer ....... [OKAY] ........................ [NO][OKAY] [NO]transformer ..............stochastic_transformer ............ [OKAY][NO] .[OKAY] .......[NO]stochastic_transformer transformer[OKAY] ....... .............[OKAY] stochastic_transformer[NO] .......[NO] . [OKAY] .......[NO] ....... [OKAY] [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name op name op name ................ ................................ ................ installedinstalledinstalled installed .. .. .... compatiblecompatible compatiblecompatible---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES]cpu_adam cpu_adam ......cpu_adam............... [YES]..............................[OKAY] ......[YES][YES] [OKAY] ............ [OKAY][OKAY] fused_adamfused_adam ............. [NO]fused_adamfused_adam............. ....... ............. .............[OKAY] [NO][NO] .......[NO]....... fused_lamb[OKAY]....... [OKAY] [OKAY] fused_lamb ............. fused_lamb .............[NO]fused_lamb .................................[NO] [OKAY]....... [NO][NO] [OKAY].............. [OKAY] [OKAY] sparse_attn ............ [NO] .......sparse_attn [OKAY] sparse_attn............ transformer[NO] ............sparse_attn............ [NO][NO]....... ............ .............. [OKAY] [NO] [OKAY] [OKAY] transformer....... stochastic_transformer[OKAY]............ transformer[NO]. transformer...................[NO] ............[NO] [OKAY]....... [NO].......[OKAY] stochastic_transformer[OKAY] ....... [OKAY]. stochastic_transformer [NO] ........ [OKAY][NO] .......stochastic_transformer [OKAY] . [NO] ....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name op name op name................................................ installed................installed installed .. installed..compatible .. ..compatible -------------------------------------------------- compatible --------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... cpu_adam[YES] cpu_adamcpu_adam..................... ............... ...............[OKAY][YES][YES] ......[YES]...... [OKAY]......[OKAY] [OKAY] fused_adam ............. [NO] .......fused_adam [OKAY]fused_adam............. fused_adam .............[NO]............. fused_lamb....... [NO] [NO] .............[OKAY]....... [NO].......[OKAY] fused_lamb....... .............[OKAY][OKAY] fused_lamb [NO]fused_lamb ............. ....... ............. [NO] [OKAY] [NO] ....... .......[OKAY] [OKAY]sparse_attn ............ [NO] ....... [OKAY]sparse_attn sparse_attn............transformer [NO]........................sparse_attn ....... [NO] [NO]....... ............ [OKAY][OKAY][NO] ....... .......transformer[OKAY] stochastic_transformer [OKAY]............ transformer .[NO] ............transformer[NO]....... .......[NO] [OKAY]............[OKAY] ....... [NO][OKAY]stochastic_transformer ....... .stochastic_transformer [OKAY] [NO] ........ [NO]stochastic_transformer[OKAY] ....... .[OKAY] [NO] ....... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY] [OKAY][OKAY] ----------------------------------------------------------------------------------------------------[OKAY]-------------------------------------------------- op nameop name -------------------------------------------------- op name................ -------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name -------------------------------------------------- ................op name ................ ................ op nameinstalledinstalledinstalled ...................... compatiblecompatiblecompatible installed ------------------------------------------------------------------------------------------------------------------------------------------------------ .. compatible -------------------------------------------------- op name installedop name ................ ................ ................ ..installedinstalledinstalled compatible ...... --------------------------------------------------compatiblecompatible compatible ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adamcpu_adamcpu_adam ............................................. cpu_adam[YES] [YES] [YES] ...... ...... ..................... [OKAY] [OKAY] [OKAY] [YES] ...... [OKAY] cpu_adam ............... cpu_adam[YES]cpu_adam cpu_adam ...... ..............................[OKAY]............... [YES] [YES][YES] .................. [OKAY][OKAY][OKAY] fused_adam fused_adam............. fused_adam.............[NO] fused_adam .............[NO]....... [OKAY][NO].................... fused_adam ............. [NO] ....... [OKAY] .......[OKAY] [NO] [OKAY]fused_lamb fused_adamfused_adamfused_lamb fused_adam ............. ............. .......................... [NO][NO] ....... [NO] [NO].......[OKAY] .......fused_lamb.............fused_lamb .............[NO] .............[OKAY] [NO] [NO] ....... ....... ....... [OKAY] [OKAY] [OKAY] fused_lamb .......[OKAY]....... fused_lamb[OKAY] [OKAY]............. ............. [NO] ....... [OKAY] [NO]fused_lamb ....... fused_lamb............. [OKAY] ............. [NO] sparse_attn[NO] .......................... [OKAY][NO][OKAY] sparse_attn ............sparse_attn [NO]............sparse_attn .......[NO]............ [OKAY].......[NO] .......sparse_attn [OKAY]............ [OKAY]....... sparse_attn[OKAY] transformer [NO] ....... transformer[OKAY] ............sparse_attn [NO] transformer sparse_attn ................... ............[OKAY] ............ [NO] transformer............ ............ ............transformer [NO] [NO] [NO]............ .............. ....... [NO] [OKAY][OKAY] [OKAY] [NO] [NO] ....... stochastic_transformer....... ....... [OKAY] [OKAY][OKAY] . ....... transformer[OKAY] [NO] .......transformer stochastic_transformertransformer [OKAY]............ stochastic_transformerstochastic_transformer............ stochastic_transformer.. [NO] [NO][NO]. .............. .......[NO][OKAY] [OKAY][OKAY]....... .............[NO] [NO][NO]....... ..............[OKAY] [OKAY][OKAY] [OKAY] stochastic_transformer . [NO] ....... [OKAY] stochastic_transformer stochastic_transformer . [NO]. .......[NO] [OKAY]....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................. .................................... [OKAY][OKAY] [OKAY] [OKAY]-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------op name op name................op nameop name installed................................................ ..installedinstalled installed compatible ...... -------------------------------------------------- compatiblecompatible compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES]cpu_adam cpu_adam.............................. ...... ............... [YES][YES] [OKAY] [YES] ............ ......[OKAY][OKAY] [OKAY] fused_adam ............. [NO]fused_adam fused_adamfused_adam....... ..........................[OKAY] ............. [NO][NO][NO] ..............fused_lamb ....... [OKAY][OKAY] ............. [OKAY][NO] fused_lambfused_lamb....... ..........................fused_lamb[OKAY] .............[NO][NO] [NO].............. .......[OKAY][OKAY] [OKAY] sparse_attn ............ [NO] .......sparse_attnsparse_attnsparse_attn ............[OKAY]............ ............ [NO] [NO][NO]....... .......transformer....... [OKAY] [OKAY] ............[OKAY] [NO]transformer transformer .......transformer ........................ ............ [OKAY][NO] [NO] [NO] ..................... [OKAY][OKAY][OKAY] stochastic_transformer stochastic_transformerstochastic_transformer .stochastic_transformer . .[NO] . [NO][NO] ....... [NO] ....... .......[OKAY]....... [OKAY][OKAY][OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install pathDeepSpeed general environment info: ............... torch install path['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ............... torch version .................... 1.8.1 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch cuda version DeepSpeed general environment info:torch install path ............... torch version11.1 ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1 torch version torch cuda version ............... 11.1.................... nvcc version1.8.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ....................nvcc version 1.8.1..................... 11.2 torch cuda versiondeepspeed install path .......................... 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']nvcc version deepspeed info..................... ...................11.2 ..................... torch cuda version11.2 ...............deepspeed install path 11.1........... async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] 0.4.2+bc17042, bc17042, big-sciencedeepspeed install path deepspeed wheel compiled w............ ...... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']torch 1.8, cuda 11.1 nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']..................... deepspeed info11.2 utils .................. [YES] ...... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ...................deepspeed install path 0.4.2+bc17042, bc17042, big-science........... deepspeed wheel compiled w. ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']...... quantizer .............. [NO] ....... [OKAY] torch 1.8, cuda 11.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science -------------------------------------------------- deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] .................... 1.8.1torch version ....................torch cuda version ...............1.8.1 11.1 torch cuda versionnvcc version .................................... 11.211.1 deepspeed install pathnvcc version ................................ 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install path deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 DeepSpeed general environment info: nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1DeepSpeed general environment info: nvcc version ..................... 11.2 deepspeed install path ...........torch install path ...............['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed wheel compiled w. ......torch version torch 1.8, cuda 11.1.................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................................................................ installedinstalledinstalledinstalled .... .. ..compatiblecompatiblecompatible compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam...............cpu_adamcpu_adam ..............................[YES]............... [YES]......[YES][YES] ...... [OKAY]...... ...... [OKAY] [OKAY] [OKAY] fused_adam fused_adam............. fused_adam fused_adam.............[NO] ..........................[NO]....... [NO] ....... [NO][OKAY] ....... [OKAY] ....... [OKAY][OKAY]fused_lamb fused_lamb............. fused_lambfused_lamb ............. [NO] .............[NO] ............. ....... [NO].......[NO] [OKAY] [OKAY].............. [OKAY] [OKAY] sparse_attnsparse_attnsparse_attnsparse_attn .................................... ............[NO] [NO] [NO] [NO].............. [OKAY][OKAY]....... DeepSpeed general environment info:DeepSpeed general environment info: ....... [OKAY]transformer[OKAY] transformer torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ........................transformer [NO]transformer [NO] ............ ................... ....... [NO][NO][OKAY][OKAY] .............. [OKAY][OKAY] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version ............... 11.1 stochastic_transformerstochastic_transformer stochastic_transformerstochastic_transformer .. . [NO]. [NO] [NO]..............[NO] [OKAY].............. [OKAY] nvcc version torch cuda version..................... ...............11.2 [OKAY][OKAY] 11.1deepspeed install path nvcc version........... ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ......11.2 torch 1.8, cuda 11.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninja ninja .................. .................. .................................... [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op name op name op name op name................................ ................installedinstalled ................ ..installed..installed compatible..compatible.. compatible ---------------------------------------------------------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam.............................. cpu_adam [YES]............... [YES] ............... [YES] ...... ...... [YES] [OKAY]...... [OKAY] ......[OKAY] [OKAY] fused_adamfused_adamfused_adam ..........................fused_adam ............. [NO][NO] [NO]........................... [NO].......[OKAY][OKAY] [OKAY]....... fused_lambfused_lamb .............fused_lamb[OKAY] .............[NO] ............. [NO] .......fused_lamb [NO] [OKAY]....... ............. .......[OKAY][NO] [OKAY]....... [OKAY] sparse_attn ............ sparse_attnsparse_attn[NO] sparse_attn............................... ............ [NO][OKAY] [NO][NO]....... ..............transformer[OKAY] [OKAY][OKAY]............ transformer [NO]transformer ...............................transformer [OKAY] [NO][NO] ............ .............. [NO]stochastic_transformer[OKAY][OKAY] ....... .[OKAY]stochastic_transformer stochastic_transformer [NO] .........stochastic_transformer [NO] [OKAY] [NO] ....... ........ [OKAY] [OKAY] [NO] ....... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................. ..................[OKAY] [OKAY] [OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name ................op name................ op name installedinstalled ................ ................ .. installed ..installed compatible .. compatible.. -------------------------------------------------- compatible ----------------------------------------------------------------------------------------------------compatible -------------------------------------------------- cpu_adam ............... cpu_adam[YES] cpu_adam...... cpu_adam............... ............... [OKAY] [YES] ...............[YES] ............[YES] [OKAY][OKAY]...... fused_adam[OKAY] ............. [NO] ....... [OKAY] fused_adam ............. fused_adamfused_lamb[NO]fused_adam ....................................... ....... [NO] [NO][OKAY] [NO] ....... ....... ....... [OKAY]fused_lamb[OKAY] [OKAY]............. fused_lamb[NO] fused_lamb ............. ....... ............. [NO][OKAY] sparse_attn[NO] ....... ...................[OKAY] [OKAY][NO] ....... [OKAY] sparse_attn ............ transformer[NO] ................... [NO][OKAY] sparse_attn ....... sparse_attn ............ transformer[OKAY] ............ [NO] ............[NO]....... stochastic_transformer[NO].......[OKAY] .......[OKAY]. [OKAY] [NO] ....... transformertransformerstochastic_transformer [OKAY] ............ ............ .[NO][NO] [NO].............. .......[OKAY][OKAY] [OKAY] stochastic_transformerstochastic_transformer .. [NO] [NO]....... .......[OKAY] [OKAY] ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] fused_adamfused_adam .......................... [NO] [NO]....... .......[OKAY] [OKAY] fused_lamb fused_lamb............. .............[NO] [NO]....... .......[OKAY] [OKAY] sparse_attn sparse_attn............ ............[NO] [NO]....... .......[OKAY] [OKAY] transformer transformer............ ............[NO] [NO]....... .......[OKAY] [OKAY] stochastic_transformer stochastic_transformer . [NO]. .......[NO] [OKAY]....... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ............... [NO] ....... [NO] async_iotransformer_inference ................. [NO][NO] .............. [NO][OKAY] utils .................. transformer_inference[YES] ........ [NO][OKAY] ....... [OKAY]quantizer .............. [NO] ....... [OKAY]utils .................. [YES] --------------------------------------------------...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja async_io ............... [NO] ....... [NO] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] ninja .................. [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_lamb ............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] [YES]...... ......[OKAY] [OKAY] fused_adam .............fused_adam [NO] .................... [NO][OKAY] ....... [OKAY] fused_lamb .............fused_lamb [NO]............. .......[NO] [OKAY]....... [OKAY] sparse_attn ............ sparse_attn[NO] ................... [NO][OKAY] ....... [OKAY] transformer ............transformer [NO]............ .......[NO] [OKAY]....... [OKAY] stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name op name................op name ................ ................................installedinstalled installed installed...... compatible compatible..compatible ----------------------------------------------------------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adamcpu_adam cpu_adam[YES] ................................................... [YES] [OKAY][YES]...... [YES] ...... [OKAY] ...... [OKAY][OKAY] fused_adam ............. [NO]fused_adam ....... .............fused_adam[OKAY] [NO]fused_adam............. .......[NO] fused_lamb [OKAY]....... ............. .............[OKAY] fused_lamb [NO] ....................fused_lamb[NO] [OKAY] [NO]....... .................... [NO][OKAY] [OKAY]....... [OKAY] fused_lamb .............sparse_attn [NO]............ .......sparse_attn [NO] [OKAY]sparse_attn................... ............[OKAY][NO] [NO] ..............transformer [OKAY][OKAY]............ [NO] .......transformer transformer[OKAY] ........................ [NO][NO] ..............sparse_attn stochastic_transformer[OKAY][OKAY]............ [NO] .stochastic_transformer [NO] stochastic_transformer .............. . .[OKAY] [NO][OKAY] [NO] ....... .......[OKAY] [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] ......quantizer [OKAY] .............. [NO] ....... [OKAY] quantizer .............. [NO]-------------------------------------------------- ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] ....... [OKAY]utils .................. [YES] ...... [OKAY]utils .................. [YES] quantizer...... ..............[OKAY] [NO] ....... quantizer[OKAY] .............. [NO] --------------------------------------------------....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] torch cuda versiontorch cuda version .............................. 11.111.1 nvcc version ..................... 11.2 async_io ............... transformer_inference[NO] ......... [NO][NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science utils ..................transformer_inference [YES].. ......[NO] [OKAY]....... deepspeed wheel compiled w. ......nvcc version torch 1.8, cuda 11.1..................... [OKAY] 11.2 quantizer .............. [NO] utils....... [OKAY].................. deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science [YES] ...... --------------------------------------------------[OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 DeepSpeed general environment info:torch install path ............... torch install path ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc version ..................... 11.2 torch version .................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1 torch versiontorch cuda version ................................... 1.8.111.1 nvcc versiontorch cuda version .................................... 11.211.1 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install pathnvcc version ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install path deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] op nameop nameop name op name ................ ................................ ................ installed installedinstalled installed .... .. .. compatiblecompatiblecompatible utils .................. [YES] ...... [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ quantizer .............. [NO] ....... [OKAY] compatible -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adamcpu_adam ............................................. [YES][YES] [YES]...... ............[OKAY] cpu_adam DeepSpeed general environment info: DeepSpeed general environment info:torch install path [OKAY] [OKAY]............... [YES] fused_adam...... .............[OKAY] [NO] ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'].................... 1.8.1 torch version torch cuda version.................... ...............1.8.1 11.1 fused_adam .................... fused_adam[OKAY] [NO] torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science ....................fused_lamb [NO][OKAY]............. deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 fused_adam[NO]fused_lamb....... .................................[OKAY] [NO] [OKAY] [NO] ....... fused_lamb .......[OKAY] ............. [OKAY][NO] ....... [OKAY]sparse_attn ............ [NO]fused_lamb .................... sparse_attn [OKAY] [NO] ............ [NO] ....... [OKAY]transformer sparse_attn................... ............ [NO] transformer[NO] [OKAY] .......................... [OKAY][OKAY][NO] ....... [OKAY] stochastic_transformertransformer stochastic_transformer . ............ . [NO] sparse_attn .......[NO][NO] [OKAY].............. ............ [OKAY][OKAY] [NO] ....... [OKAY]stochastic_transformer . transformer[NO] ................... [OKAY] [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] ninja .................. [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] cpu_adam ............... [YES] ...... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. fused_adam ............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] ninjafused_lamb ............................... [OKAY][NO] ....... --------------------------------------------------[OKAY] op name ................ installed .. compatible transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY]cpu_adam utils .................. [YES] ...... [OKAY] ............... [YES] ......transformer [OKAY]............ [NO] ....... [OKAY] stochastic_transformerfused_adam .............. [NO][NO] .............. [OKAY][OKAY] quantizer .............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] -------------------------------------------------- sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO]transformer_inference ......... [NO][NO] ....... [OKAY] utilstransformer_inference .................... [YES][NO] ............. [OKAY][OKAY] quantizer utils.............. ..................[NO] [YES]....... ......[OKAY] [OKAY] --------------------------------------------------quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer .............. [NO] quantizer....... ..............[OKAY] [NO] ....... --------------------------------------------------[OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] DeepSpeed general environment info: -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] nvcc versionnvcc version .......................................... 11.211.2  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch version .................... 1.8.1 torch cuda version ............... 11.1 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] async_io ............... [NO] ....... [NO] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 transformer_inference .. [NO] ....... [OKAY] nvcc version ..................... 11.2 utils .................. [YES] ...... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science quantizer .............. [NO] ....... [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[NO] transformer_inference .. [NO] ....... [OKAY] async_io ...............utils [NO].................. .......[YES] [NO]...... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference-------------------------------------------------- .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO]............... .......[NO] [NO]....... [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils .................. [YES] ...... utils[OKAY] .................. [YES] quantizer...... ..............[OKAY] [NO] ....... [OKAY]quantizer .............. [NO] --------------------------------------------------....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ............... [NO] ....... [NO] transformer_inference .. [NO] ....... async_io[OKAY] ............... [NO] ....... [NO]utils .................. [YES] ...... [OKAY] quantizer .............. [NO] .......transformer_inference [OKAY].. [NO] .......-------------------------------------------------- [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja .................. .................. .................................... [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name --------------------------------------------------op name ................................op name op name installed installed................ .................. .. installedcompatiblecompatibleinstalled --------------------------------------------------.. --------------------------------------------------.. compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam [YES]............... ......[YES]cpu_adam cpu_adam[OKAY]...... ............... ............... [OKAY] [YES][YES] ............ [OKAY][OKAY] fused_adam fused_adam............. .............[NO] [NO]....... .......[OKAY]fused_adam fused_adam [OKAY] ............. fused_lamb............. [NO].............[NO] fused_lamb ....... [NO]....... ............. [OKAY][OKAY].......[NO] [OKAY]....... [OKAY]fused_lamb fused_lamb ............. ............. [NO][NO] .............. [OKAY][OKAY] sparse_attn ............sparse_attn [NO]............ .......[NO] [OKAY]....... [OKAY] transformer sparse_attntransformer............ sparse_attn ............[NO] ............ ............[NO].......[NO] [NO][OKAY].............. .......[OKAY][OKAY] [OKAY]stochastic_transformer  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer.stochastic_transformer transformer[NO]............. .......[NO][NO] ............ [OKAY] ..............  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [NO] [OKAY] [OKAY] ....... [OKAY] async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] stochastic_transformer . stochastic_transformer[NO] ........ [OKAY][NO] transformer_inference .. [NO] ....... transformer_inference[OKAY] ....... [OKAY] .. [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] quantizer...... ..............[OKAY] [NO] ....... [OKAY]quantizer .............. [NO]-------------------------------------------------- ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop name op nameop name ................ ................ ................ ................installed installed installed.. installed .. .. ..compatiblecompatible compatible compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adamcpu_adam............... cpu_adam ............... ...............[YES] ...............[YES][YES] .................. [YES] [OKAY][OKAY][OKAY] ...... [OKAY] fused_adamfused_adam fused_adam ............. fused_adam.......................... [NO] .............[NO][NO] .....................[NO] [OKAY] [OKAY] [OKAY] ....... fused_lamb[OKAY]fused_lamb .............fused_lamb............. fused_lamb [NO][NO] ............. ............. ..............[NO] [NO] [OKAY] [OKAY]....... ....... [OKAY] [OKAY] sparse_attnsparse_attn ............sparse_attn sparse_attn............ ............ [NO] ............ [NO][NO]....... ..............[OKAY] [NO] [OKAY] [OKAY]....... [OKAY]transformer transformertransformer............ transformer............[NO]............ ....... [NO] ............[NO] [OKAY] ....... [NO] ....... [OKAY] ....... [OKAY] stochastic_transformer [OKAY] stochastic_transformer stochastic_transformer. stochastic_transformer.[NO] . .......[NO][NO]. [OKAY].............. [NO][OKAY] [OKAY]....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... async_io[NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... utils[OKAY] .................. [YES] ......quantizer [OKAY].............. [NO] ....... quantizer[OKAY] .............. [NO] ....... --------------------------------------------------[OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninjaJIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils ..................utils .................. [YES] ...... [YES][OKAY] ......quantizer .............. [OKAY][NO] ....... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- async_io ............... [NO] ....... [NO] op nameop nameop name op name ................ ................................ ................ installed installedinstalled installed .... .. .. compatible compatiblecompatible compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] cpu_adamcpu_adam cpu_adamcpu_adam.............................. ...............[YES]...............[YES] [YES] ............[YES] ......[OKAY]...... [OKAY][OKAY] [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- fused_adamfused_adam fused_adamfused_adam ............. .......................... ............. [NO][NO] [NO] [NO]....... ....... .............. [OKAY] [OKAY] [OKAY][OKAY] fused_lamb .............fused_lambfused_lamb fused_lamb [NO] ............. ............. [NO].................... [NO] ....... [NO][OKAY][OKAY]....... .......[OKAY] [OKAY] sparse_attnsparse_attn ............ ............[NO]sparse_attn sparse_attn .......[NO] ............ [OKAY] ................... [NO] [OKAY] [NO]transformer -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ....... ............ .......transformer [OKAY] [NO] [OKAY]............ -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja transformertransformer[NO]....... ............[OKAY]................... [NO][NO][OKAY] ..............stochastic_transformer [OKAY]stochastic_transformer [OKAY] . .[NO]stochastic_transformerstochastic_transformer ....... ..[NO][OKAY] [NO].......[NO] ..............[OKAY] [OKAY][OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] .......transformer_inference [NO] .. [NO] ....... [OKAY] transformer_inference .. utils[NO] ......................... [YES][OKAY] ...... [OKAY] utils quantizer.................. ..............[YES] [NO]...... .......[OKAY] [OKAY] quantizer .............. [NO]-------------------------------------------------- ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO]............... [NO] ....... [NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer ..............quantizer [NO].............. ....... [OKAY] [NO] --------------------------------------------------....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO]............... .......[NO] .......[NO] [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inferenceutils .................... [YES] ...... [OKAY] quantizer ..............[NO] [NO]....... .......[OKAY] [OKAY] --------------------------------------------------utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_iotransformer_inference ................. [NO][NO] .............. [NO][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ......transformer_inference [OKAY]..  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [NO] ....... quantizer[OKAY] .............. [NO] ....... [OKAY]utils async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] .................. [YES] --------------------------------------------------...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference ..utils [NO].................. .......[YES] [OKAY]...... -------------------------------------------------- [OKAY] quantizer utils.............. ..................[NO] [YES]....... ......[OKAY] [OKAY] --------------------------------------------------quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] -------------------------------------------------- transformer_inference .. [NO]transformer_inference ....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY] [YES] ......quantizer [OKAY].............. [NO] ....... [OKAY] quantizer --------------------------------------------------.............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY]async_io -------------------------------------------------- ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] quantizer .............. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils --------------------------------------------------.................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------- async_io ............... [NO] ....... [NO] op nameop nameop nameop name ................................................................ installedinstalledinstalledinstalled ...... .. compatible compatiblecompatible compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam...............cpu_adam cpu_adam...............[YES]............... ............... [YES][YES]...... ...... [YES]...... [OKAY] [OKAY] [OKAY]...... transformer_inference .. [NO] ....... [OKAY] [OKAY] utils .................. [YES] ...... [OKAY] fused_adamfused_adamfused_adam fused_adam............. ............. ............. [NO].............[NO] [NO][NO]....... ....... .............. [OKAY] [OKAY] [OKAY][OKAY] quantizer .............. [NO] ....... [OKAY] fused_lamb fused_lambfused_lamb.............fused_lamb .............[NO].......................... [NO].......[NO][NO] [OKAY] ....... -------------------------------------------------- ....... .......[OKAY][OKAY] [OKAY] sparse_attn ............sparse_attnsparse_attn sparse_attn [NO]............ ............ ............ ....... [NO] [NO][NO] [OKAY] .............. ....... [OKAY][OKAY][OKAY] transformer ............transformertransformer [NO]transformer ............ ............ ....... ............ [NO][NO][NO][OKAY] ..................... [OKAY][OKAY][OKAY] stochastic_transformer stochastic_transformerstochastic_transformer .stochastic_transformer .[NO]. . .......[NO] [NO][NO] [OKAY] .............. ....... [OKAY][OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO]transformer_inference .. [NO] ....... [OKAY] utilstransformer_inference .................... [YES][NO] ............. [OKAY][OKAY] quantizer .............. [NO] .......utils [OKAY].................. [YES] ...... --------------------------------------------------[OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. quantizer .............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] DeepSpeed general environment info: -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference async_io..utils ...............[NO].................. .......[NO][YES] [OKAY]............. [NO][OKAY] quantizerutils ................................ [NO] [YES]....... ......transformer_inference [OKAY][OKAY] .. [NO]-------------------------------------------------- .......quantizer [OKAY].............. [NO] ....... [OKAY] utils --------------------------------------------------.................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... [NO]............... [NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... [OKAY]quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inferencetransformer_inference .... [NO] [NO]....... .......[OKAY] [OKAY] async_ioasync_io .............................. [NO][NO] .............. [NO][NO] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- [OKAY] -------------------------------------------------- quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO]async_io ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... utils .................. [YES] ...... [OKAY] [OKAY] quantizer .............. [NO] ....... [OKAY] utils .................. [YES] ......utils [OKAY].................. [YES] ...... quantizer[OKAY] .............. [NO] ....... [OKAY]quantizer .............. [NO] --------------------------------------------------....... -------------------------------------------------- [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] utils....... ..................[NO] [YES] ...... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. quantizer .............. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ...................................................... ..................[OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. op name op name ................op name................ installed................installed................ ....installedinstalled compatible ..compatible .. --------------------------------------------------compatiblecompatible-------------------------------------------------- ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. cpu_adamcpu_adam cpu_adamcpu_adam.............................. ..............................[YES][YES] [YES][YES]............ ............[OKAY][OKAY] [OKAY][OKAY] async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] fused_adam fused_adam............. fused_adam ............. fused_adam[NO] ............. .............[NO].......[NO] [NO].............. [OKAY] ....... [OKAY][OKAY] [OKAY] utilstransformer_inference .................... [YES][NO] ............. [OKAY][OKAY] fused_lamb fused_lambfused_lambfused_lamb............. .......................................[NO] [NO][NO][NO]....... ....... [OKAY] ..............[OKAY] [OKAY][OKAY] quantizer .............. [NO]utils ......................... [OKAY][YES] sparse_attn sparse_attn............ sparse_attnsparse_attn............[NO] ........................[NO]....... [NO][NO].......[OKAY] ...... [OKAY] -------------------------------------------------- .......[OKAY]....... [OKAY][OKAY] transformer quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- transformer............transformer ............[NO]............transformer .......[NO][NO]............ [OKAY]..............[NO] [OKAY][OKAY]....... [OKAY] stochastic_transformer stochastic_transformerstochastic_transformer. stochastic_transformer [NO]. . ....... . [NO][NO] [OKAY] [NO] .............. .......[OKAY] [OKAY][OKAY] /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. -------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system async_io ............... [NO] ....... [NO] meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer_inference .. [NO] ....... [OKAY] JIT compiled ops requires ninja JIT compiled ops requires ninja utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. /bin/sh: line 0: type: git: not found async_io ...............async_io [NO]............... .......[NO] [NO]....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ...................... [NO][NO] async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] ....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] ......quantizer [OKAY].............. [OKAY] [NO] .......quantizer [OKAY].............. quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] [NO] ....... --------------------------------------------------[OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... ...............[NO] [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY]utils .................. [YES] ...... [OKAY]utils .................. [YES]quantizer .................... [OKAY][NO] ....... [OKAY]quantizer .............. [NO] --------------------------------------------------....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY][OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- op name-------------------------------------------------- op name op name................................op name ................installedinstalled................ installed....installed compatiblecompatible.... --------------------------------------------------compatiblecompatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam...............cpu_adam............... [YES] ............... .....................[YES] [OKAY][YES][YES] ...... ............[OKAY] [OKAY][OKAY] fused_adam ............. [NO] ....... [OKAY]fused_adamfused_adam fused_adam ....................................... fused_lamb[NO][NO] [NO].................... ....... [NO]....... [OKAY] [OKAY][OKAY] ....... [OKAY] fused_lambfused_lambfused_lamb ....................................... [NO][NO][NO] ....... ....... ....... sparse_attn[OKAY] [OKAY] [OKAY] ............ [NO] ....... [OKAY] transformer ............ [NO] .......sparse_attn sparse_attnsparse_attn [OKAY] .................................... [NO][NO][NO]stochastic_transformer ............... ....... [OKAY] [NO][OKAY][OKAY] .......transformer [OKAY]transformer............ transformer ............[NO] ............ [NO] ....... [NO].......[OKAY] .......[OKAY] [OKAY] stochastic_transformer stochastic_transformerstochastic_transformer. [NO].. .......[NO][NO] [OKAY].............. [OKAY][OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] /bin/sh: line 0: type: git: not found transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] ...... utils[OKAY] .................. [YES] ......quantizer [OKAY].............. [NO] ....... [OKAY]quantizer .............. [NO] .......-------------------------------------------------- [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninjaJIT compiled ops requires ninja  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] ....... transformer_inference[OKAY] .. [NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... quantizer[OKAY] .............. [NO] .......quantizer [OKAY].............. [NO] .......-------------------------------------------------- [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................................... ..................[OKAY][OKAY].................. [OKAY] ----------------------------------------------------------------------------------------------------[OKAY] ----------------------------------------------------------------------------------------------------op nameop name op nameop name................................ ................................installed installed installedinstalled.... ....compatiblecompatible compatible--------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam...............cpu_adam cpu_adam[YES] ................................................... [YES][OKAY][YES][YES] .................. [OKAY][OKAY] [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_adamfused_adamfused_lambfused_adam .................................................... [NO][NO][NO][NO] ..................... ....... [OKAY] [OKAY][OKAY] [OKAY] fused_lamb fused_lambfused_lamb............. ............. ............. [NO] [NO] sparse_attn [NO] .......................... .......[OKAY][OKAY] [NO] [OKAY]....... [OKAY] transformer ............ sparse_attnsparse_attn[NO] sparse_attn................... ............ ............ [OKAY][NO] [NO] [NO] ....... ....... ....... stochastic_transformer[OKAY] [OKAY][OKAY] . transformer [NO]............ transformertransformer....... ............[NO]............[OKAY] [NO].......[NO] .......[OKAY]....... [OKAY][OKAY] stochastic_transformer stochastic_transformer. stochastic_transformer [NO]. ........[NO] [OKAY][NO]....... [OKAY] ....... [OKAY]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** using world size: 256, data-parallel-size: 8, tensor-model-parallel size: 4, pipeline-model-parallel size: 8 using torch.float16 for parameters ... ------------------------ arguments ------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/codecarbon consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False data_impl ....................................... mmap data_parallel_size .............................. 8 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1289770.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 5 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 20480 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False global_batch_size ............................... 2048 hidden_dropout .................................. 0.1 hidden_size ..................................... 16384 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 512 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 10 log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 6e-05 lr_decay_iters .................................. None lr_decay_samples ................................ 126953125 lr_decay_style .................................. cosine lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 216320 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 6e-06 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 32 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 32 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 8 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['16', '16', '6_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints save_interval ................................... 300 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 43 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/tr8-104B-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 300000000 use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 256 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 16 to global batch size 2048 with batch size increments 16 over 6000000 samples. > building GPT2BPETokenizer tokenizer ...  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. /bin/sh: line 0: type: git: not found async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info:DeepSpeed general environment info: torch version .................... 1.8.1 DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.1 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info ................... 0.4.2+bc17042, bc17042, big-science torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] .......utils [OKAY].................. [YES] ...... [OKAY] utils .................. quantizer[YES] .................... [NO][OKAY] ....... [OKAY] quantizer .............. --------------------------------------------------[NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference ..transformer_inference [NO] .. .......[NO] [OKAY]....... [OKAY] utilsutils .................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... [OKAY] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- ninjaninjaninjaninja .................. ...................................................... [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name................op nameop name ................................installed................ installedinstalledinstalled.. ..compatible.... compatible --------------------------------------------------compatible compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... cpu_adamcpu_adamcpu_adam[OKAY] ............................................. [YES][YES][YES] .................. [OKAY][OKAY]fused_adam[OKAY] ............. [NO] ....... [OKAY] fused_adam fused_lamb.............fused_adam fused_adam[NO] .......................... ............. [NO]....... [NO] [NO] .......[OKAY]....... .......[OKAY][OKAY] fused_lamb [OKAY] ............. fused_lamb[NO] fused_lamb.................... [NO][OKAY]............. sparse_attn .......[NO] [OKAY] ............ ....... [NO][OKAY] ....... [OKAY] sparse_attn ............ transformer [NO]............ sparse_attn.......[NO] ............ [OKAY] ....... [NO]sparse_attn [OKAY]...................transformer [OKAY]............ [NO] [NO]stochastic_transformer....... .......transformer[OKAY] . [OKAY]............ transformer[NO][NO] ...................stochastic_transformer ....... [NO] [OKAY] .[OKAY] ....... [NO] [OKAY]....... stochastic_transformer[OKAY] stochastic_transformer. [NO] ........ [NO][OKAY] ....... [OKAY] DeepSpeed general environment info: torch install path ...............DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path torch version............... .................... 1.8.1 torch cuda version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']............... 11.1 torch versionnvcc version ......................................... 1.8.111.2 deepspeed install path torch cuda version........... ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.1 deepspeed infonvcc version ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] async_ioasync_io .............................. [NO]transformer_inference[NO] ................ [NO][NO][NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES]transformer_inference ........ [OKAY][NO] utils .................. [YES] ...... [OKAY] transformer_inference....... ..[OKAY]quantizer quantizer .............. [NO] ....... [OKAY] [NO].............. .......[NO] [OKAY]....... utils [OKAY].................. -------------------------------------------------- [YES] ......-------------------------------------------------- utils[OKAY] .................. [YES] ...... quantizer[OKAY] .............. [NO] .......quantizer [OKAY].............. [NO] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info:DeepSpeed general environment info: utils .................. [YES] ...... [OKAY] torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] quantizer .............. [NO] ....... [OKAY] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 -------------------------------------------------- nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... [NO] .......transformer_inference [NO].. [NO] ....... [OKAY] utilstransformer_inference .................... [YES][NO] ............. [OKAY][OKAY] quantizer .............. utils[NO] ......................... [OKAY][YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1DeepSpeed general environment info: nvcc version ..................... 11.2 deepspeed install pathtorch install path ........... ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 deepspeed wheel compiled w. ......torch version torch 1.8, cuda 11.1.................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science torch cuda version ............... 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: DeepSpeed general environment info:DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.1 11.1 nvcc versionnvcc version .......................................... 11.211.2 torch cuda version ............... 11.1 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ...............async_io [NO]............... .......[NO] [NO]....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils utils.................. ..................[YES] [YES]...... [OKAY]...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY]-------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. DeepSpeed general environment info: async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference ..utils [NO].................. .......[YES] [OKAY]...... torch version .................... 1.8.1 [OKAY] torch cuda version ............... 11.1 quantizer utils.............. ..................[NO] [YES]....... ......[OKAY] [OKAY] nvcc version ..................... 11.2 -------------------------------------------------- deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] quantizer .............. [NO] ....... [OKAY] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 /bin/sh: line 0: type: git: not found nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer .............. ..............[NO] [NO]....... .......[OKAY] [OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] async_io ............... transformer_inference[NO] ......... [NO][NO] ....... [OKAY] utils transformer_inference.................. ..[YES] [NO]...... .......[OKAY] [OKAY] quantizer ..............utils [NO].................. .......[YES] [OKAY]...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utilsutils .................................... [YES][YES] ...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... DeepSpeed general environment info:['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. deepspeed info ................... 0.4.2+bc17042, bc17042, big-sciencetorch install path deepspeed wheel compiled w................ ...... torch 1.8, cuda 11.1 async_io ............... [NO] ....... [NO] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference .. [NO] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 utils .................. [YES] ...... [OKAY] nvcc version ..................... 11.2 quantizer .............. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] -------------------------------------------------- deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info:torch version .................... torch install path1.8.1 ............... torch cuda version torch install path............... 11.1............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']nvcc version ..................... torch version11.2 ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']....................deepspeed install path 1.8.1........... torch version torch cuda version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'].................... ...............deepspeed info1.8.1 11.1................... torch cuda version0.4.2+bc17042, bc17042, big-sciencenvcc version  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ....................................deepspeed wheel compiled w. 11.111.2 ...... deepspeed install path nvcc version torch 1.8, cuda 11.1 ........... ..................... 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install pathdeepspeed info .............................. 0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] transformer_inference .. [NO] ....... [OKAY] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science utils .................. [YES]async_io ...... ...............[OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 [NO] ....... quantizer[NO] .............. [NO] ....... [OKAY] --------------------------------------------------transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] DeepSpeed general environment info: -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.1 1.8.1 torch cuda version torch cuda version............... ............... 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ...................0.4.2+bc17042, bc17042, big-science 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install pathtorch version ................................... 1.8.1 torch cuda version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ............... 11.1torch version ....................nvcc version 1.8.1..................... 11.2torch cuda version deepspeed install path............... ...........11.1 nvcc version['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] .....................deepspeed info 11.2................... deepspeed install path0.4.2+bc17042, bc17042, big-science ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info: torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference ..transformer_inference [NO].. ....... [NO][OKAY] ....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY]quantizer .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info:DeepSpeed general environment info: torch version .................... 1.8.1 torch cuda version ............... 11.1 torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch version torch cuda version.................... ...............1.8.1 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 11.2deepspeed info ...................deepspeed install path 0.4.2+bc17042, bc17042, big-science........... deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install pathDeepSpeed general environment info: ............... torch install path ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1 torch cuda versiontorch version ................................... 11.11.8.1 DeepSpeed general environment info: nvcc version torch cuda version..................... ...............11.2 11.1deepspeed install path nvcc version........... ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']11.2 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 deepspeed infodeepspeed install path .............................. 0.4.2+bc17042, bc17042, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed wheel compiled w. deepspeed info...... ...................torch 1.8, cuda 11.1 torch cuda version ............... 11.1 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. DeepSpeed general environment info:...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 1.8.1 torch version ....................torch cuda version 1.8.1............... 11.1 torch cuda versionnvcc version .................................... 11.111.2 deepspeed install pathnvcc version ................................DeepSpeed general environment info: 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install path deepspeed info........... ................... torch install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science ...............deepspeed infodeepspeed wheel compiled w. ......................... torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']deepspeed wheel compiled w. ...... torch versiontorch 1.8, cuda 11.1 .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] DeepSpeed general environment info:torch version .................... 1.8.1 torch cuda versiontorch install path .............................. 11.1 nvcc version ..................... 11.2['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed install path ........... torch version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'].................... deepspeed info ................... 0.4.2+bc17042, bc17042, big-science1.8.1 DeepSpeed general environment info: deepspeed wheel compiled w. torch cuda version...... ...............torch 1.8, cuda 11.1 11.1 nvcc version ..................... 11.2 torch install pathDeepSpeed general environment info: ............... deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch versiontorch cuda version ................................... 11.11.8.1 nvcc version .....................torch cuda version 11.2............... deepspeed install path11.1 ...........nvcc version .....................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 11.2deepspeed info deepspeed install path................... ...........0.4.2+bc17042, bc17042, big-science ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed wheel compiled w. ......deepspeed info torch 1.8, cuda 11.1................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info deepspeed info................... ................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.1 11.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version ............... ...............11.1 11.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version ............... 11.1torch cuda version ...............nvcc version .....................11.1 11.2nvcc version deepspeed install path..................... ...........11.2 deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ...........deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. deepspeed info...... ...................torch 1.8, cuda 11.1 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']torch version .................... 1.8.1torch version .................... torch cuda version1.8.1 ............... torch cuda version11.1 ...............nvcc version 11.1..................... nvcc version11.2 .....................deepspeed install path 11.2........... deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']................... deepspeed info0.4.2+bc17042, bc17042, big-science ...................deepspeed wheel compiled w. 0.4.2+bc17042, bc17042, big-science...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.1.................... 1.8.1torch cuda version ............... torch cuda version11.1 ............... nvcc version11.1 ..................... nvcc version11.2 ..................... deepspeed install path11.2 ........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ................... deepspeed info0.4.2+bc17042, bc17042, big-science ................... deepspeed wheel compiled w.0.4.2+bc17042, bc17042, big-science ...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 DeepSpeed general environment info:DeepSpeed general environment info: deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'].................... 1.8.1 torch version ....................torch cuda version 1.8.1............... 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path deepspeed info........... ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']0.4.2+bc17042, bc17042, big-science deepspeed infodeepspeed wheel compiled w. ......................... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO]async_io ...................... [NO][NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version ....................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] 1.8.1 torch versiontorch cuda version ................................... 1.8.111.1 nvcc versiontorch cuda version .................................... 11.211.1 deepspeed install pathnvcc version ................................ 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install pathdeepspeed info .............................. 0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1 torch versiontorch cuda version ................................... 1.8.111.1 nvcc versiontorch cuda version .................................... 11.211.1 DeepSpeed general environment info: deepspeed install pathnvcc version ................................ 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed install pathdeepspeed info .............................. 0.4.2+bc17042, bc17042, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.1 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ****  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version ............... 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed install path ...........deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] 0.4.2+bc17042, bc17042, big-sciencedeepspeed info deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 DeepSpeed general environment info: deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 /bin/sh: line 0: type: git: not found nvcc version ..................... 11.2 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] .......async_io [NO]............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] utils....... ..................[OKAY] [YES] ...... [OKAY] utilsquantizer ................................ [YES][NO] ............. [OKAY][OKAY] --------------------------------------------------quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] ....... [NO] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version torch cuda version............... DeepSpeed general environment info: ...............11.1 11.1 nvcc version nvcc version..................... .....................torch install path11.2 11.2 deepspeed install path............... deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed info deepspeed info ......................................torch version 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science.................... deepspeed wheel compiled w.1.8.1deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch cuda versiontorch 1.8, cuda 11.1 ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: DeepSpeed general environment info:DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch versiontorch version ........................................ 1.8.11.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch cuda versiontorch cuda version .............................. 11.111.1 deepspeed info ................... 0.4.2+bc17042, bc17042, big-science nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_io ............... [NO] async_io....... ...............[NO] [NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... [OKAY]quantizer .............. [NO] ....... [OKAY] **** Git info for Megatron: git_hash=unknown git_branch=unknown **** quantizer .............. [NO] .......-------------------------------------------------- [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 DeepSpeed general environment info:torch cuda version ............... 11.1 nvcc versiontorch install path ..................... ...............11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] deepspeed info ...................torch version 0.4.2+bc17042, bc17042, big-science.................... deepspeed wheel compiled w.1.8.1 ...... torch 1.8, cuda 11.1torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] DeepSpeed general environment info: transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found  [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [WARNING]  async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] ....... [OKAY]utils .................. [YES] ...... [OKAY] utils quantizer ................................ [YES][NO] ............. [OKAY] [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: /bin/sh: line 0: type: git: not found torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2DeepSpeed general environment info: deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] torch install pathdeepspeed info .................................. 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1 nvcc version nvcc version..................... .....................11.2 11.2 deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']deepspeed info ...................deepspeed info 0.4.2+bc17042, bc17042, big-science................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed wheel compiled w. ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ...............['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']1.8.1 torch cuda versiontorch version ................................... 11.1DeepSpeed general environment info:1.8.1 nvcc version .....................torch cuda version 11.2............... torch install path deepspeed install path 11.1 ............... ........... nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']..................... 11.2deepspeed info deepspeed install path...................['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ...........0.4.2+bc17042, bc17042, big-science torch versiondeepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] .......................... deepspeed info 1.8.1 torch 1.8, cuda 11.1 ................... torch cuda version0.4.2+bc17042, bc17042, big-science ...............deepspeed wheel compiled w. 11.1...... nvcc versiontorch 1.8, cuda 11.1 ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda version torch cuda version............... ...............11.1 11.1 nvcc versionnvcc version ..................... .....................11.2 11.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.1 1.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] /bin/sh: line 0: type: git: not found torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] /bin/sh: line 0: type: git: not found deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version ............... ...............11.1 11.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed infodeepspeed info ...................................... 0.4.2+bc17042, bc17042, big-science0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1DeepSpeed general environment info: nvcc version ..................... 11.2 deepspeed install path ...........torch install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']............... deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w.['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] ...... torch 1.8, cuda 11.1torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found DeepSpeed general environment info: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.11.8.1 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']........... deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed']................... 0.4.2+bc17042, bc17042, big-sciencedeepspeed info deepspeed wheel compiled w.................... ...... 0.4.2+bc17042, bc17042, big-sciencetorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch'] torch version .................... 1.8.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr1-13B/DeepSpeed-big-science/deepspeed'] deepspeed info ................... 0.4.2+bc17042, bc17042, big-science deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** /bin/sh: line 0: type: git: not found /bin/sh: line 0: type: git: not found **** Git info for Megatron: git_hash=unknown git_branch=unknown **** **** Git info for Megatron: git_hash=unknown git_branch=unknown **** > setting codecarbon ... > setting tensorboard ... > initializing torch distributed ... > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 8 > setting random seeds to 43 ... [2021-09-30 03:52:24,170] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2761 and data parallel seed: 43 > compiling dataset index builder ... make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/data' >>> done with dataset index builder. Compilation time: 0.302 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8-104B/Megatron-DeepSpeed-tr8-104B/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 21.604 seconds time to initialize megatron (seconds): -17.820 [after megatron is initialized] datetime: 2021-09-30 03:52:46 building GPT model ... [2021-09-30 03:52:46,250] [INFO] [utils.py:680:see_memory_usage] Before Building Model /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved warnings.warn( /gpfswork/rech/six/commun/conda/tr1-13B/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved warnings.warn( [2021-09-30 03:52:46,252] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-09-30 03:52:46,252] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 38.22 GB, percent = 20.4% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=0, data=4, model=0): 16, ProcessCoord(pipe=0, data=4, model=1): 17, ProcessCoord(pipe=0, data=4, model=2): 18, ProcessCoord(pipe=0, data=4, model=3): 19, ProcessCoord(pipe=0, data=5, model=0): 20, ProcessCoord(pipe=0, data=5, model=1): 21, ProcessCoord(pipe=0, data=5, model=2): 22, ProcessCoord(pipe=0, data=5, model=3): 23, ProcessCoord(pipe=0, data=6, model=0): 24, ProcessCoord(pipe=0, data=6, model=1): 25, ProcessCoord(pipe=0, data=6, model=2): 26, ProcessCoord(pipe=0, data=6, model=3): 27, ProcessCoord(pipe=0, data=7, model=0): 28, ProcessCoord(pipe=0, data=7, model=1): 29, ProcessCoord(pipe=0, data=7, model=2): 30, ProcessCoord(pipe=0, data=7, model=3): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=0, model=1): 33, ProcessCoord(pipe=1, data=0, model=2): 34, ProcessCoord(pipe=1, data=0, model=3): 35, ProcessCoord(pipe=1, data=1, model=0): 36, ProcessCoord(pipe=1, data=1, model=1): 37, ProcessCoord(pipe=1, data=1, model=2): 38, ProcessCoord(pipe=1, data=1, model=3): 39, ProcessCoord(pipe=1, data=2, model=0): 40, ProcessCoord(pipe=1, data=2, model=1): 41, ProcessCoord(pipe=1, data=2, model=2): 42, ProcessCoord(pipe=1, data=2, model=3): 43, ProcessCoord(pipe=1, data=3, model=0): 44, ProcessCoord(pipe=1, data=3, model=1): 45, ProcessCoord(pipe=1, data=3, model=2): 46, ProcessCoord(pipe=1, data=3, model=3): 47, ProcessCoord(pipe=1, data=4, model=0): 48, ProcessCoord(pipe=1, data=4, model=1): 49, ProcessCoord(pipe=1, data=4, model=2): 50, ProcessCoord(pipe=1, data=4, model=3): 51, ProcessCoord(pipe=1, data=5, model=0): 52, ProcessCoord(pipe=1, data=5, model=1): 53, ProcessCoord(pipe=1, data=5, model=2): 54, ProcessCoord(pipe=1, data=5, model=3): 55, ProcessCoord(pipe=1, data=6, model=0): 56, ProcessCoord(pipe=1, data=6, model=1): 57, ProcessCoord(pipe=1, data=6, model=2): 58, ProcessCoord(pipe=1, data=6, model=3): 59, ProcessCoord(pipe=1, data=7, model=0): 60, ProcessCoord(pipe=1, data=7, model=1): 61, ProcessCoord(pipe=1, data=7, model=2): 62, ProcessCoord(pipe=1, data=7, model=3): 63, ProcessCoord(pipe=2, data=0, model=0): 64, ProcessCoord(pipe=2, data=0, model=1): 65, ProcessCoord(pipe=2, data=0, model=2): 66, ProcessCoord(pipe=2, data=0, model=3): 67, ProcessCoord(pipe=2, data=1, model=0): 68, ProcessCoord(pipe=2, data=1, model=1): 69, ProcessCoord(pipe=2, data=1, model=2): 70, ProcessCoord(pipe=2, data=1, model=3): 71, ProcessCoord(pipe=2, data=2, model=0): 72, ProcessCoord(pipe=2, data=2, model=1): 73, ProcessCoord(pipe=2, data=2, model=2): 74, ProcessCoord(pipe=2, data=2, model=3): 75, ProcessCoord(pipe=2, data=3, model=0): 76, ProcessCoord(pipe=2, data=3, model=1): 77, ProcessCoord(pipe=2, data=3, model=2): 78, ProcessCoord(pipe=2, data=3, model=3): 79, ProcessCoord(pipe=2, data=4, model=0): 80, ProcessCoord(pipe=2, data=4, model=1): 81, ProcessCoord(pipe=2, data=4, model=2): 82, ProcessCoord(pipe=2, data=4, model=3): 83, ProcessCoord(pipe=2, data=5, model=0): 84, ProcessCoord(pipe=2, data=5, model=1): 85, ProcessCoord(pipe=2, data=5, model=2): 86, ProcessCoord(pipe=2, data=5, model=3): 87, ProcessCoord(pipe=2, data=6, model=0): 88, ProcessCoord(pipe=2, data=6, model=1): 89, ProcessCoord(pipe=2, data=6, model=2): 90, ProcessCoord(pipe=2, data=6, model=3): 91, ProcessCoord(pipe=2, data=7, model=0): 92, ProcessCoord(pipe=2, data=7, model=1): 93, ProcessCoord(pipe=2, data=7, model=2): 94, ProcessCoord(pipe=2, data=7, model=3): 95, ProcessCoord(pipe=3, data=0, model=0): 96, ProcessCoord(pipe=3, data=0, model=1): 97, ProcessCoord(pipe=3, data=0, model=2): 98, ProcessCoord(pipe=3, data=0, model=3): 99, ProcessCoord(pipe=3, data=1, model=0): 100, ProcessCoord(pipe=3, data=1, model=1): 101, ProcessCoord(pipe=3, data=1, model=2): 102, ProcessCoord(pipe=3, data=1, model=3): 103, ProcessCoord(pipe=3, data=2, model=0): 104, ProcessCoord(pipe=3, data=2, model=1): 105, ProcessCoord(pipe=3, data=2, model=2): 106, ProcessCoord(pipe=3, data=2, model=3): 107, ProcessCoord(pipe=3, data=3, model=0): 108, ProcessCoord(pipe=3, data=3, model=1): 109, ProcessCoord(pipe=3, data=3, model=2): 110, ProcessCoord(pipe=3, data=3, model=3): 111, ProcessCoord(pipe=3, data=4, model=0): 112, ProcessCoord(pipe=3, data=4, model=1): 113, ProcessCoord(pipe=3, data=4, model=2): 114, ProcessCoord(pipe=3, data=4, model=3): 115, ProcessCoord(pipe=3, data=5, model=0): 116, ProcessCoord(pipe=3, data=5, model=1): 117, ProcessCoord(pipe=3, data=5, model=2): 118, ProcessCoord(pipe=3, data=5, model=3): 119, ProcessCoord(pipe=3, data=6, model=0): 120, ProcessCoord(pipe=3, data=6, model=1): 121, ProcessCoord(pipe=3, data=6, model=2): 122, ProcessCoord(pipe=3, data=6, model=3): 123, ProcessCoord(pipe=3, data=7, model=0): 124, ProcessCoord(pipe=3, data=7, model=1): 125, ProcessCoord(pipe=3, data=7, model=2): 126, ProcessCoord(pipe=3, data=7, model=3): 127, ProcessCoord(pipe=4, data=0, model=0): 128, ProcessCoord(pipe=4, data=0, model=1): 129, ProcessCoord(pipe=4, data=0, model=2): 130, ProcessCoord(pipe=4, data=0, model=3): 131, ProcessCoord(pipe=4, data=1, model=0): 132, ProcessCoord(pipe=4, data=1, model=1): 133, ProcessCoord(pipe=4, data=1, model=2): 134, ProcessCoord(pipe=4, data=1, model=3): 135, ProcessCoord(pipe=4, data=2, model=0): 136, ProcessCoord(pipe=4, data=2, model=1): 137, ProcessCoord(pipe=4, data=2, model=2): 138, ProcessCoord(pipe=4, data=2, model=3): 139, ProcessCoord(pipe=4, data=3, model=0): 140, ProcessCoord(pipe=4, data=3, model=1): 141, ProcessCoord(pipe=4, data=3, model=2): 142, ProcessCoord(pipe=4, data=3, model=3): 143, ProcessCoord(pipe=4, data=4, model=0): 144, ProcessCoord(pipe=4, data=4, model=1): 145, ProcessCoord(pipe=4, data=4, model=2): 146, ProcessCoord(pipe=4, data=4, model=3): 147, ProcessCoord(pipe=4, data=5, model=0): 148, ProcessCoord(pipe=4, data=5, model=1): 149, ProcessCoord(pipe=4, data=5, model=2): 150, ProcessCoord(pipe=4, data=5, model=3): 151, ProcessCoord(pipe=4, data=6, model=0): 152, ProcessCoord(pipe=4, data=6, model=1): 153, ProcessCoord(pipe=4, data=6, model=2): 154, ProcessCoord(pipe=4, data=6, model=3): 155, ProcessCoord(pipe=4, data=7, model=0): 156, ProcessCoord(pipe=4, data=7, model=1): 157, ProcessCoord(pipe=4, data=7, model=2): 158, ProcessCoord(pipe=4, data=7, model=3): 159, ProcessCoord(pipe=5, data=0, model=0): 160, ProcessCoord(pipe=5, data=0, model=1): 161, ProcessCoord(pipe=5, data=0, model=2): 162, ProcessCoord(pipe=5, data=0, model=3): 163, ProcessCoord(pipe=5, data=1, model=0): 164, ProcessCoord(pipe=5, data=1, model=1): 165, ProcessCoord(pipe=5, data=1, model=2): 166, ProcessCoord(pipe=5, data=1, model=3): 167, ProcessCoord(pipe=5, data=2, model=0): 168, ProcessCoord(pipe=5, data=2, model=1): 169, ProcessCoord(pipe=5, data=2, model=2): 170, ProcessCoord(pipe=5, data=2, model=3): 171, ProcessCoord(pipe=5, data=3, model=0): 172, ProcessCoord(pipe=5, data=3, model=1): 173, ProcessCoord(pipe=5, data=3, model=2): 174, ProcessCoord(pipe=5, data=3, model=3): 175, ProcessCoord(pipe=5, data=4, model=0): 176, ProcessCoord(pipe=5, data=4, model=1): 177, ProcessCoord(pipe=5, data=4, model=2): 178, ProcessCoord(pipe=5, data=4, model=3): 179, ProcessCoord(pipe=5, data=5, model=0): 180, ProcessCoord(pipe=5, data=5, model=1): 181, ProcessCoord(pipe=5, data=5, model=2): 182, ProcessCoord(pipe=5, data=5, model=3): 183, ProcessCoord(pipe=5, data=6, model=0): 184, ProcessCoord(pipe=5, data=6, model=1): 185, ProcessCoord(pipe=5, data=6, model=2): 186, ProcessCoord(pipe=5, data=6, model=3): 187, ProcessCoord(pipe=5, data=7, model=0): 188, ProcessCoord(pipe=5, data=7, model=1): 189, ProcessCoord(pipe=5, data=7, model=2): 190, ProcessCoord(pipe=5, data=7, model=3): 191, ProcessCoord(pipe=6, data=0, model=0): 192, ProcessCoord(pipe=6, data=0, model=1): 193, ProcessCoord(pipe=6, data=0, model=2): 194, ProcessCoord(pipe=6, data=0, model=3): 195, ProcessCoord(pipe=6, data=1, model=0): 196, ProcessCoord(pipe=6, data=1, model=1): 197, ProcessCoord(pipe=6, data=1, model=2): 198, ProcessCoord(pipe=6, data=1, model=3): 199, ProcessCoord(pipe=6, data=2, model=0): 200, ProcessCoord(pipe=6, data=2, model=1): 201, ProcessCoord(pipe=6, data=2, model=2): 202, ProcessCoord(pipe=6, data=2, model=3): 203, ProcessCoord(pipe=6, data=3, model=0): 204, ProcessCoord(pipe=6, data=3, model=1): 205, ProcessCoord(pipe=6, data=3, model=2): 206, ProcessCoord(pipe=6, data=3, model=3): 207, ProcessCoord(pipe=6, data=4, model=0): 208, ProcessCoord(pipe=6, data=4, model=1): 209, ProcessCoord(pipe=6, data=4, model=2): 210, ProcessCoord(pipe=6, data=4, model=3): 211, ProcessCoord(pipe=6, data=5, model=0): 212, ProcessCoord(pipe=6, data=5, model=1): 213, ProcessCoord(pipe=6, data=5, model=2): 214, ProcessCoord(pipe=6, data=5, model=3): 215, ProcessCoord(pipe=6, data=6, model=0): 216, ProcessCoord(pipe=6, data=6, model=1): 217, ProcessCoord(pipe=6, data=6, model=2): 218, ProcessCoord(pipe=6, data=6, model=3): 219, ProcessCoord(pipe=6, data=7, model=0): 220, ProcessCoord(pipe=6, data=7, model=1): 221, ProcessCoord(pipe=6, data=7, model=2): 222, ProcessCoord(pipe=6, data=7, model=3): 223, ProcessCoord(pipe=7, data=0, model=0): 224, ProcessCoord(pipe=7, data=0, model=1): 225, ProcessCoord(pipe=7, data=0, model=2): 226, ProcessCoord(pipe=7, data=0, model=3): 227, ProcessCoord(pipe=7, data=1, model=0): 228, ProcessCoord(pipe=7, data=1, model=1): 229, ProcessCoord(pipe=7, data=1, model=2): 230, ProcessCoord(pipe=7, data=1, model=3): 231, ProcessCoord(pipe=7, data=2, model=0): 232, ProcessCoord(pipe=7, data=2, model=1): 233, ProcessCoord(pipe=7, data=2, model=2): 234, ProcessCoord(pipe=7, data=2, model=3): 235, ProcessCoord(pipe=7, data=3, model=0): 236, ProcessCoord(pipe=7, data=3, model=1): 237, ProcessCoord(pipe=7, data=3, model=2): 238, ProcessCoord(pipe=7, data=3, model=3): 239, ProcessCoord(pipe=7, data=4, model=0): 240, ProcessCoord(pipe=7, data=4, model=1): 241, ProcessCoord(pipe=7, data=4, model=2): 242, ProcessCoord(pipe=7, data=4, model=3): 243, ProcessCoord(pipe=7, data=5, model=0): 244, ProcessCoord(pipe=7, data=5, model=1): 245, ProcessCoord(pipe=7, data=5, model=2): 246, ProcessCoord(pipe=7, data=5, model=3): 247, ProcessCoord(pipe=7, data=6, model=0): 248, ProcessCoord(pipe=7, data=6, model=1): 249, ProcessCoord(pipe=7, data=6, model=2): 250, ProcessCoord(pipe=7, data=6, model=3): 251, ProcessCoord(pipe=7, data=7, model=0): 252, ProcessCoord(pipe=7, data=7, model=1): 253, ProcessCoord(pipe=7, data=7, model=2): 254, ProcessCoord(pipe=7, data=7, model=3): 255} [2021-09-30 03:52:47,659] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=7 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe stage=1 layers=4 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe stage=2 layers=4 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=3 layers=4 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe stage=4 layers=4 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe stage=5 layers=4 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe stage=6 layers=4 27: ParallelTransformerLayerPipe 28: ParallelTransformerLayerPipe 29: ParallelTransformerLayerPipe 30: ParallelTransformerLayerPipe stage=7 layers=8 31: ParallelTransformerLayerPipe 32: ParallelTransformerLayerPipe 33: ParallelTransformerLayerPipe 34: ParallelTransformerLayerPipe 35: 36: MixedFusedLayerNorm 37: EmbeddingPipe 38: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 1745293312 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 1986498560 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 1986465792 > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 1986498560 [2021-09-30 03:52:48,902] [INFO] [utils.py:680:see_memory_usage] After Building Model [2021-09-30 03:52:48,903] [INFO] [utils.py:681:see_memory_usage] MA 3.77 GB Max_MA 3.79 GB CA 3.79 GB Max_CA 4 GB [2021-09-30 03:52:48,903] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory: used = 38.4 GB, percent = 20.5% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 1986465792 setting training iterations to 159576 > learning rate decay style: cosine DeepSpeed is enabled. [2021-09-30 03:52:48,923] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+bc17042, git-hash=bc17042, git-branch=big-science [2021-09-30 03:52:49,069] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False [2021-09-30 03:52:49,069] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-09-30 03:52:49,069] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer [2021-09-30 03:52:49,069] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-09-30 03:52:49,069] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-09-30 03:52:49,069] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-09-30 03:52:49,070] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000 [2021-09-30 03:52:49,070] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000 [2021-09-30 03:52:49,070] [INFO] [stage2.py:108:__init__] CPU Offload: False [2021-09-30 03:52:49,070] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False [2021-09-30 03:52:53,500] [INFO] [stage2.py:419:__init__] optimizer state initialized [2021-09-30 03:52:53,500] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-09-30 03:52:53,500] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-09-30 03:52:53,500] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-09-30 03:52:53,500] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-09-30 03:52:53,500] [INFO] [config.py:900:print] DeepSpeedEngine configuration: [2021-09-30 03:52:53,500] [INFO] [config.py:904:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-09-30 03:52:53,500] [INFO] [config.py:904:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-09-30 03:52:53,500] [INFO] [config.py:904:print] allreduce_always_fp32 ........ False [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] amp_enabled .................. False [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] amp_params ................... False [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] checkpoint_tag_validation_enabled True [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] checkpoint_tag_validation_fail False [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] disable_allgather ............ False [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] dump_state ................... False [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] eigenvalue_enabled ........... False [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] eigenvalue_gas_boundary_resolution 1 [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] eigenvalue_layer_num ......... 0 [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] eigenvalue_max_iter .......... 100 [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] eigenvalue_stability ......... 1e-06 [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] eigenvalue_tol ............... 0.01 [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] eigenvalue_verbose ........... False [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] elasticity_enabled ........... False [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] fp16_enabled ................. True [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] fp16_mixed_quantize .......... False [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] global_rank .................. 0 [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] gradient_accumulation_steps .. 256 [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] gradient_clipping ............ 1.0 [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] gradient_predivide_factor .... 1.0 [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] initial_dynamic_scale ........ 4096 [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] loss_scale ................... 0 [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] memory_breakdown ............. False [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] optimizer_legacy_fusion ...... False [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] optimizer_name ............... None [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] optimizer_params ............. None [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] pld_enabled .................. False [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] pld_params ................... False [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] prescale_gradients ........... False [2021-09-30 03:52:53,501] [INFO] [config.py:904:print] quantize_change_rate ......... 0.001 [2021-09-30 03:52:53,502] [INFO] [config.py:904:print] quantize_groups .............. 1 [2021-09-30 03:52:53,502] [INFO] [config.py:904:print] quantize_offset .............. 1000 [2021-09-30 03:52:53,502] [INFO] [config.py:904:print] quantize_period .............. 1000 [2021-09-30 03:52:53,502] [INFO] [config.py:904:print] quantize_rounding ............ 0 [2021-09-30 03:52:53,502] [INFO] [config.py:904:print] quantize_start_bits .......... 16 [2021-09-30 03:52:53,502] [INFO] [config.py:904:print] quantize_target_bits ......... 8 [2021-09-30 03:52:53,502] [INFO] [config.py:904:print] quantize_training_enabled .... False [2021-09-30 03:52:53,502] [INFO] [config.py:904:print] quantize_type ................ 0 [2021-09-30 03:52:53,502] [INFO] [config.py:904:print] quantize_verbose ............. False [2021-09-30 03:52:53,502] [INFO] [config.py:904:print] scheduler_name ............... None [2021-09-30 03:52:53,502] [INFO] [config.py:904:print] scheduler_params ............. None [2021-09-30 03:52:53,502] [INFO] [config.py:904:print] sparse_attention ............. None [2021-09-30 03:52:53,502] [INFO] [config.py:904:print] sparse_gradients_enabled ..... False [2021-09-30 03:52:53,502] [INFO] [config.py:904:print] steps_per_print .............. 2000 [2021-09-30 03:52:53,502] [INFO] [config.py:904:print] tensorboard_enabled .......... False [2021-09-30 03:52:53,502] [INFO] [config.py:904:print] tensorboard_job_name ......... DeepSpeedJobName [2021-09-30 03:52:53,502] [INFO] [config.py:904:print] tensorboard_output_path ...... [2021-09-30 03:52:53,502] [INFO] [config.py:904:print] train_batch_size ............. 2048 [2021-09-30 03:52:53,502] [INFO] [config.py:904:print] train_micro_batch_size_per_gpu 1 [2021-09-30 03:52:53,502] [INFO] [config.py:904:print] use_quantizer_kernel ......... False [2021-09-30 03:52:53,502] [INFO] [config.py:904:print] wall_clock_breakdown ......... False [2021-09-30 03:52:53,502] [INFO] [config.py:904:print] world_size ................... 8 [2021-09-30 03:52:53,502] [INFO] [config.py:904:print] zero_allow_untested_optimizer False [2021-09-30 03:52:53,502] [INFO] [config.py:904:print] zero_config .................. { "stage": 1, "contiguous_gradients": false, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-09-30 03:52:53,502] [INFO] [config.py:904:print] zero_enabled ................. True [2021-09-30 03:52:53,502] [INFO] [config.py:904:print] zero_optimization_stage ...... 1 [2021-09-30 03:52:53,502] [INFO] [config.py:906:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 2.048000e+03, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-09-30 03:52:53,503] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=256 micro_batch_size=1 [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=129 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=128 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=130 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=131 STAGE=4 LAYERS=4 [19, 23) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=66 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=67 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=64 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=225 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=226 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=99 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=98 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=96 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=193 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=194 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=195 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=192 STAGE=6 LAYERS=4 [27, 31) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=65 STAGE=2 LAYERS=4 [11, 15) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=35 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=34 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=32 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=33 STAGE=1 LAYERS=4 [7, 11) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=160 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=162 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=161 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=163 STAGE=5 LAYERS=4 [23, 27) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=224 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=97 STAGE=3 LAYERS=4 [15, 19) STAGE_PARAMS=1745293312 (1745.293M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) [2021-09-30 03:52:54,122] [INFO] [engine.py:134:__init__] RANK=227 STAGE=7 LAYERS=8 [31, 39) STAGE_PARAMS=1986498560 (1986.499M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M) > using checkpoint value 6e-05 for learning rate > using checkpoint value 6e-06 for minimum learning rate > using checkpoint value 216320 for warmup iterations > using checkpoint value 126953125 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 8 ZeRO state_dicts for rank 156 successfully loaded 8 ZeRO state_dicts for rank 41 successfully loaded 8 ZeRO state_dicts for rank 40 successfully loaded 8 ZeRO state_dicts for rank 42 successfully loaded 8 ZeRO state_dicts for rank 204 successfully loaded 8 ZeRO state_dicts for rank 168 successfully loaded 8 ZeRO state_dicts for rank 52 successfully loaded 8 ZeRO state_dicts for rank 207 successfully loaded 8 ZeRO state_dicts for rank 200 successfully loaded 8 ZeRO state_dicts for rank 158 successfully loaded 8 ZeRO state_dicts for rank 43 successfully loaded 8 ZeRO state_dicts for rank 53 successfully loaded 8 ZeRO state_dicts for rank 104 successfully loaded 8 ZeRO state_dicts for rank 140 successfully loaded 8 ZeRO state_dicts for rank 203 successfully loaded 8 ZeRO state_dicts for rank 206 successfully loaded 8 ZeRO state_dicts for rank 220 successfully loaded 8 ZeRO state_dicts for rank 128 successfully loaded 8 ZeRO state_dicts for rank 75 successfully loaded 8 ZeRO state_dicts for rank 84 successfully loaded 8 ZeRO state_dicts for rank 80 successfully loaded 8 ZeRO state_dicts for rank 201 successfully loaded 8 ZeRO state_dicts for rank 157 successfully loaded 8 ZeRO state_dicts for rank 115 successfully loaded 8 ZeRO state_dicts for rank 202 successfully loaded 8 ZeRO state_dicts for rank 106 successfully loaded 8 ZeRO state_dicts for rank 174 successfully loaded 8 ZeRO state_dicts for rank 105 successfully loaded 8 ZeRO state_dicts for rank 100 successfully loaded 8 ZeRO state_dicts for rank 216 successfully loaded 8 ZeRO state_dicts for rank 196 successfully loaded 8 ZeRO state_dicts for rank 34 successfully loaded 8 ZeRO state_dicts for rank 138 successfully loaded 8 ZeRO state_dicts for rank 136 successfully loaded 8 ZeRO state_dicts for rank 142 successfully loaded 8 ZeRO state_dicts for rank 83 successfully loaded 8 ZeRO state_dicts for rank 130 successfully loaded 8 ZeRO state_dicts for rank 199 successfully loaded 8 ZeRO state_dicts for rank 38 successfully loaded 8 ZeRO state_dicts for rank 117 successfully loaded 8 ZeRO state_dicts for rank 112 successfully loaded 8 ZeRO state_dicts for rank 54 successfully loaded 8 ZeRO state_dicts for rank 134 successfully loaded 8 ZeRO state_dicts for rank 55 successfully loaded 8 ZeRO state_dicts for rank 213 successfully loaded 8 ZeRO state_dicts for rank 96 successfully loaded 8 ZeRO state_dicts for rank 90 successfully loaded 8 ZeRO state_dicts for rank 215 successfully loaded 8 ZeRO state_dicts for rank 60 successfully loaded 8 ZeRO state_dicts for rank 211 successfully loaded 8 ZeRO state_dicts for rank 171 successfully loaded 8 ZeRO state_dicts for rank 81 successfully loaded 8 ZeRO state_dicts for rank 73 loading 8 zero partition checkpoints for rank 156 successfully loaded 8 ZeRO state_dicts for rank 69 successfully loaded 8 ZeRO state_dicts for rank 70 successfully loaded 8 ZeRO state_dicts for rank 124 successfully loaded 8 ZeRO state_dicts for rank 195 successfully loaded 8 ZeRO state_dicts for rank 170 successfully loaded 8 ZeRO state_dicts for rank 36 successfully loaded 8 ZeRO state_dicts for rank 103 successfully loaded 8 ZeRO state_dicts for rank 169 successfully loaded 8 ZeRO state_dicts for rank 219 successfully loaded 8 ZeRO state_dicts for rank 208 loading 8 zero partition checkpoints for rank 40 successfully loaded 8 ZeRO state_dicts for rank 48 successfully loaded 8 ZeRO state_dicts for rank 51 successfully loaded 8 ZeRO state_dicts for rank 150 successfully loaded 8 ZeRO state_dicts for rank 32 successfully loaded 8 ZeRO state_dicts for rank 141 successfully loaded 8 ZeRO state_dicts for rank 107 successfully loaded 8 ZeRO state_dicts for rank 120 successfully loaded 8 ZeRO state_dicts for rank 132 successfully loaded 8 ZeRO state_dicts for rank 95 successfully loaded 8 ZeRO state_dicts for rank 58 successfully loaded 8 ZeRO state_dicts for rank 50 loading 8 zero partition checkpoints for rank 41 successfully loaded 8 ZeRO state_dicts for rank 113 successfully loaded 8 ZeRO state_dicts for rank 139 successfully loaded 8 ZeRO state_dicts for rank 64 successfully loaded 8 ZeRO state_dicts for rank 135 successfully loaded 8 ZeRO state_dicts for rank 109 successfully loaded 8 ZeRO state_dicts for rank 67 successfully loaded 8 ZeRO state_dicts for rank 62 successfully loaded 8 ZeRO state_dicts for rank 127 successfully loaded 8 ZeRO state_dicts for rank 223 successfully loaded 8 ZeRO state_dicts for rank 71 successfully loaded 8 ZeRO state_dicts for rank 44 successfully loaded 8 ZeRO state_dicts for rank 143 successfully loaded 8 ZeRO state_dicts for rank 205 successfully loaded 8 ZeRO state_dicts for rank 56 successfully loaded 8 ZeRO state_dicts for rank 209 successfully loaded 8 ZeRO state_dicts for rank 125 successfully loaded 8 ZeRO state_dicts for rank 175 successfully loaded 8 ZeRO state_dicts for rank 191 successfully loaded 8 ZeRO state_dicts for rank 214 successfully loaded 8 ZeRO state_dicts for rank 46 successfully loaded 8 ZeRO state_dicts for rank 63 successfully loaded 8 ZeRO state_dicts for rank 129 successfully loaded 8 ZeRO state_dicts for rank 131 successfully loaded 8 ZeRO state_dicts for rank 152 successfully loaded 8 ZeRO state_dicts for rank 94 successfully loaded 8 ZeRO state_dicts for rank 65 successfully loaded 8 ZeRO state_dicts for rank 144 successfully loaded 8 ZeRO state_dicts for rank 121 successfully loaded 8 ZeRO state_dicts for rank 57 successfully loaded 8 ZeRO state_dicts for rank 108 successfully loaded 8 ZeRO state_dicts for rank 61 successfully loaded 8 ZeRO state_dicts for rank 218 successfully loaded 8 ZeRO state_dicts for rank 133 successfully loaded 8 ZeRO state_dicts for rank 74 successfully loaded 8 ZeRO state_dicts for rank 72 successfully loaded 8 ZeRO state_dicts for rank 137 successfully loaded 8 ZeRO state_dicts for rank 160 successfully loaded 8 ZeRO state_dicts for rank 101 successfully loaded 8 ZeRO state_dicts for rank 28 successfully loaded 8 ZeRO state_dicts for rank 151 successfully loaded 8 ZeRO state_dicts for rank 188 successfully loaded 8 ZeRO state_dicts for rank 164 successfully loaded 8 ZeRO state_dicts for rank 180 successfully loaded 8 ZeRO state_dicts for rank 111 successfully loaded 8 ZeRO state_dicts for rank 176 successfully loaded 8 ZeRO state_dicts for rank 110 successfully loaded 8 ZeRO state_dicts for rank 198 successfully loaded 8 ZeRO state_dicts for rank 66 successfully loaded 8 ZeRO state_dicts for rank 222 successfully loaded 8 ZeRO state_dicts for rank 184 successfully loaded 8 ZeRO state_dicts for rank 35 successfully loaded 8 ZeRO state_dicts for rank 98 successfully loaded 8 ZeRO state_dicts for rank 210 successfully loaded 8 ZeRO state_dicts for rank 154 successfully loaded 8 ZeRO state_dicts for rank 172 successfully loaded 8 ZeRO state_dicts for rank 178 successfully loaded 8 ZeRO state_dicts for rank 190 successfully loaded 8 ZeRO state_dicts for rank 39 loading 8 zero partition checkpoints for rank 42 successfully loaded 8 ZeRO state_dicts for rank 146 successfully loaded 8 ZeRO state_dicts for rank 212 successfully loaded 8 ZeRO state_dicts for rank 85 successfully loaded 8 ZeRO state_dicts for rank 91 successfully loaded 8 ZeRO state_dicts for rank 148 successfully loaded 8 ZeRO state_dicts for rank 153 successfully loaded 8 ZeRO state_dicts for rank 114 successfully loaded 8 ZeRO state_dicts for rank 59 successfully loaded 8 ZeRO state_dicts for rank 68 successfully loaded 8 ZeRO state_dicts for rank 123 successfully loaded 8 ZeRO state_dicts for rank 182 successfully loaded 8 ZeRO state_dicts for rank 221 loading 8 zero partition checkpoints for rank 204 successfully loaded 8 ZeRO state_dicts for rank 99 successfully loaded 8 ZeRO state_dicts for rank 93 successfully loaded 8 ZeRO state_dicts for rank 89 successfully loaded 8 ZeRO state_dicts for rank 77 loading 8 zero partition checkpoints for rank 52 successfully loaded 8 ZeRO state_dicts for rank 167 successfully loaded 8 ZeRO state_dicts for rank 97 successfully loaded 8 ZeRO state_dicts for rank 179 successfully loaded 8 ZeRO state_dicts for rank 116 successfully loaded 8 ZeRO state_dicts for rank 173 successfully loaded 8 ZeRO state_dicts for rank 147 successfully loaded 8 ZeRO state_dicts for rank 33 successfully loaded 8 ZeRO state_dicts for rank 22 successfully loaded 8 ZeRO state_dicts for rank 2 successfully loaded 8 ZeRO state_dicts for rank 49 successfully loaded 8 ZeRO state_dicts for rank 119 successfully loaded 8 ZeRO state_dicts for rank 37 successfully loaded 8 ZeRO state_dicts for rank 187 successfully loaded 8 ZeRO state_dicts for rank 192 successfully loaded 8 ZeRO state_dicts for rank 155 successfully loaded 8 ZeRO state_dicts for rank 159 successfully loaded 8 ZeRO state_dicts for rank 87 successfully loaded 8 ZeRO state_dicts for rank 92 loading 8 zero partition checkpoints for rank 168 successfully loaded 8 ZeRO state_dicts for rank 165 successfully loaded 8 ZeRO state_dicts for rank 82 successfully loaded 8 ZeRO state_dicts for rank 161 successfully loaded 8 ZeRO state_dicts for rank 189 loading 8 zero partition checkpoints for rank 207 successfully loaded 8 ZeRO state_dicts for rank 166 successfully loaded 8 ZeRO state_dicts for rank 76 successfully loaded 8 ZeRO state_dicts for rank 79 successfully loaded 8 ZeRO state_dicts for rank 193 successfully loaded 8 ZeRO state_dicts for rank 26 successfully loaded 8 ZeRO state_dicts for rank 217 successfully loaded 8 ZeRO state_dicts for rank 162 successfully loaded 8 ZeRO state_dicts for rank 181 successfully loaded 8 ZeRO state_dicts for rank 186 successfully loaded 8 ZeRO state_dicts for rank 194 loading 8 zero partition checkpoints for rank 200 loading 8 zero partition checkpoints for rank 53 loading 8 zero partition checkpoints for rank 104 successfully loaded 8 ZeRO state_dicts for rank 20 successfully loaded 8 ZeRO state_dicts for rank 145 successfully loaded 8 ZeRO state_dicts for rank 102 successfully loaded 8 ZeRO state_dicts for rank 86 successfully loaded 8 ZeRO state_dicts for rank 88 successfully loaded 8 ZeRO state_dicts for rank 18 successfully loaded 8 ZeRO state_dicts for rank 163 loading 8 zero partition checkpoints for rank 140 successfully loaded 8 ZeRO state_dicts for rank 149 successfully loaded 8 ZeRO state_dicts for rank 23 loading 8 zero partition checkpoints for rank 43 successfully loaded 8 ZeRO state_dicts for rank 78 successfully loaded 8 ZeRO state_dicts for rank 45 successfully loaded 8 ZeRO state_dicts for rank 197 successfully loaded 8 ZeRO state_dicts for rank 122 successfully loaded 8 ZeRO state_dicts for rank 47 successfully loaded 8 ZeRO state_dicts for rank 30 successfully loaded 8 ZeRO state_dicts for rank 126 loading 8 zero partition checkpoints for rank 206 successfully loaded 8 ZeRO state_dicts for rank 14 successfully loaded 8 ZeRO state_dicts for rank 231 successfully loaded 8 ZeRO state_dicts for rank 31 successfully loaded 8 ZeRO state_dicts for rank 16 loading 8 zero partition checkpoints for rank 220 successfully loaded 8 ZeRO state_dicts for rank 29 loading 8 zero partition checkpoints for rank 75 successfully loaded 8 ZeRO state_dicts for rank 185 successfully loaded 8 ZeRO state_dicts for rank 183 successfully loaded 8 ZeRO state_dicts for rank 228 successfully loaded 8 ZeRO state_dicts for rank 118 successfully loaded 8 ZeRO state_dicts for rank 177 loading 8 zero partition checkpoints for rank 128 successfully loaded 8 ZeRO state_dicts for rank 240 successfully loaded 8 ZeRO state_dicts for rank 12 successfully loaded 8 ZeRO state_dicts for rank 224 successfully loaded 8 ZeRO state_dicts for rank 232 successfully loaded 8 ZeRO state_dicts for rank 248 successfully loaded 8 ZeRO state_dicts for rank 21 successfully loaded 8 ZeRO state_dicts for rank 0 successfully loaded 8 ZeRO state_dicts for rank 27 loading 8 zero partition checkpoints for rank 106 successfully loaded 8 ZeRO state_dicts for rank 6 loading 8 zero partition checkpoints for rank 80 loading 8 zero partition checkpoints for rank 105 successfully loaded 8 ZeRO state_dicts for rank 19 successfully loaded 8 ZeRO state_dicts for rank 244 loading 8 zero partition checkpoints for rank 174 successfully loaded 8 ZeRO state_dicts for rank 236 loading 8 zero partition checkpoints for rank 84 loading 8 zero partition checkpoints for rank 100 loading 8 zero partition checkpoints for rank 196 loading 8 zero partition checkpoints for rank 202 loading 8 zero partition checkpoints for rank 136 successfully loaded 8 ZeRO state_dicts for rank 3 successfully loaded 8 ZeRO state_dicts for rank 226 successfully loaded 8 ZeRO state_dicts for rank 24 successfully loaded 8 ZeRO state_dicts for rank 246 successfully loaded 8 ZeRO state_dicts for rank 255 loading 8 zero partition checkpoints for rank 83 loading 8 zero partition checkpoints for rank 112 loading 8 zero partition checkpoints for rank 38 loading 8 zero partition checkpoints for rank 130 loading 8 zero partition checkpoints for rank 138 loading 8 zero partition checkpoints for rank 55 successfully loaded 8 ZeRO state_dicts for rank 15 successfully loaded 8 ZeRO state_dicts for rank 243 successfully loaded 8 ZeRO state_dicts for rank 251 loading 8 zero partition checkpoints for rank 54 loading 8 zero partition checkpoints for rank 171 loading 8 zero partition checkpoints for rank 158 loading 8 zero partition checkpoints for rank 81 loading 8 zero partition checkpoints for rank 90 successfully loaded 8 ZeRO state_dicts for rank 252 loading 8 zero partition checkpoints for rank 215 successfully loaded 8 ZeRO state_dicts for rank 247 loading 8 zero partition checkpoints for rank 60 loading 8 zero partition checkpoints for rank 96 loading 8 zero partition checkpoints for rank 170 loading 8 zero partition checkpoints for rank 73 successfully loaded 8 ZeRO state_dicts for rank 227 loading 8 zero partition checkpoints for rank 103 successfully loaded 8 ZeRO state_dicts for rank 241 loading 8 zero partition checkpoints for rank 216 loading 8 zero partition checkpoints for rank 211 loading 8 zero partition checkpoints for rank 169 successfully loaded 8 ZeRO state_dicts for rank 229 loading 8 zero partition checkpoints for rank 51 successfully loaded 8 ZeRO state_dicts for rank 4 loading 8 zero partition checkpoints for rank 117 successfully loaded 8 ZeRO state_dicts for rank 230 successfully loaded 8 ZeRO state_dicts for rank 17 loading 8 zero partition checkpoints for rank 213 successfully loaded 8 ZeRO state_dicts for rank 242 successfully loaded 8 ZeRO state_dicts for rank 250 loading 8 zero partition checkpoints for rank 208 successfully loaded 8 ZeRO state_dicts for rank 225 successfully loaded 8 ZeRO state_dicts for rank 9 successfully loaded 8 ZeRO state_dicts for rank 1 successfully loaded 8 ZeRO state_dicts for rank 11 successfully loaded 8 ZeRO state_dicts for rank 7 successfully loaded 8 ZeRO state_dicts for rank 253 successfully loaded 8 ZeRO state_dicts for rank 237 loading 8 zero partition checkpoints for rank 70 loading 8 zero partition checkpoints for rank 107 successfully loaded 8 ZeRO state_dicts for rank 5 loading 8 zero partition checkpoints for rank 48 successfully loaded 8 ZeRO state_dicts for rank 245 loading 8 zero partition checkpoints for rank 58 loading 8 zero partition checkpoints for rank 67 loading 8 zero partition checkpoints for rank 135 successfully loaded 8 ZeRO state_dicts for rank 234 loading 8 zero partition checkpoints for rank 109 loading 8 zero partition checkpoints for rank 150 successfully loaded 8 ZeRO state_dicts for rank 25 loading 8 zero partition checkpoints for rank 44 loading 8 zero partition checkpoints for rank 62 loading 8 zero partition checkpoints for rank 50 loading 8 zero partition checkpoints for rank 32 successfully loaded 8 ZeRO state_dicts for rank 10 loading 8 zero partition checkpoints for rank 132 loading 8 zero partition checkpoints for rank 209 loading 8 zero partition checkpoints for rank 205 successfully loaded 8 ZeRO state_dicts for rank 249 loading 8 zero partition checkpoints for rank 139 loading 8 zero partition checkpoints for rank 127 successfully loaded 8 ZeRO state_dicts for rank 233 loading 8 zero partition checkpoints for rank 157 successfully loaded 8 ZeRO state_dicts for rank 235 loading 8 zero partition checkpoints for rank 134 loading 8 zero partition checkpoints for rank 143 loading 8 zero partition checkpoints for rank 175 loading 8 zero partition checkpoints for rank 131 successfully loaded 8 ZeRO state_dicts for rank 254 loading 8 zero partition checkpoints for rank 94 loading 8 zero partition checkpoints for rank 214 loading 8 zero partition checkpoints for rank 120 loading 8 zero partition checkpoints for rank 144 loading 8 zero partition checkpoints for rank 36 successfully loaded 8 ZeRO state_dicts for rank 239 loading 8 zero partition checkpoints for rank 108 successfully loaded 8 ZeRO state_dicts for rank 238 loading 8 zero partition checkpoints for rank 61 loading 8 zero partition checkpoints for rank 129 loading 8 zero partition checkpoints for rank 63 loading 8 zero partition checkpoints for rank 152 loading 8 zero partition checkpoints for rank 137 successfully loaded 8 ZeRO state_dicts for rank 8 loading 8 zero partition checkpoints for rank 219 loading 8 zero partition checkpoints for rank 151 loading 8 zero partition checkpoints for rank 195 loading 8 zero partition checkpoints for rank 164 loading 8 zero partition checkpoints for rank 203 loading 8 zero partition checkpoints for rank 210 loading 8 zero partition checkpoints for rank 114 loading 8 zero partition checkpoints for rank 212 loading 8 zero partition checkpoints for rank 101 loading 8 zero partition checkpoints for rank 180 loading 8 zero partition checkpoints for rank 218 loading 8 zero partition checkpoints for rank 172 loading 8 zero partition checkpoints for rank 125 loading 8 zero partition checkpoints for rank 99 loading 8 zero partition checkpoints for rank 95 loading 8 zero partition checkpoints for rank 153 loading 8 zero partition checkpoints for rank 68 loading 8 zero partition checkpoints for rank 188 loading 8 zero partition checkpoints for rank 98 loading 8 zero partition checkpoints for rank 91 loading 8 zero partition checkpoints for rank 33 loading 8 zero partition checkpoints for rank 184 loading 8 zero partition checkpoints for rank 182 loading 8 zero partition checkpoints for rank 154 loading 8 zero partition checkpoints for rank 178 loading 8 zero partition checkpoints for rank 77 loading 8 zero partition checkpoints for rank 89 loading 8 zero partition checkpoints for rank 198 loading 8 zero partition checkpoints for rank 85 loading 8 zero partition checkpoints for rank 37 loading 8 zero partition checkpoints for rank 97 loading 8 zero partition checkpoints for rank 110 loading 8 zero partition checkpoints for rank 66 loading 8 zero partition checkpoints for rank 111 loading 8 zero partition checkpoints for rank 161 loading 8 zero partition checkpoints for rank 189 loading 8 zero partition checkpoints for rank 147 loading 8 zero partition checkpoints for rank 146 loading 8 zero partition checkpoints for rank 116 loading 8 zero partition checkpoints for rank 173 loading 8 zero partition checkpoints for rank 59 loading 8 zero partition checkpoints for rank 28 loading 8 zero partition checkpoints for rank 221 loading 8 zero partition checkpoints for rank 133 loading 8 zero partition checkpoints for rank 201 loading 8 zero partition checkpoints for rank 166 loading 8 zero partition checkpoints for rank 148 loading 8 zero partition checkpoints for rank 82 loading 8 zero partition checkpoints for rank 87 loading 8 zero partition checkpoints for rank 192 loading 8 zero partition checkpoints for rank 79 successfully loaded 8 ZeRO state_dicts for rank 13 loading 8 zero partition checkpoints for rank 115 loading 8 zero partition checkpoints for rank 181 loading 8 zero partition checkpoints for rank 124 loading 8 zero partition checkpoints for rank 193 loading 8 zero partition checkpoints for rank 76 loading 8 zero partition checkpoints for rank 119 loading 8 zero partition checkpoints for rank 74 loading 8 zero partition checkpoints for rank 186 loading 8 zero partition checkpoints for rank 187 loading 8 zero partition checkpoints for rank 72 loading 8 zero partition checkpoints for rank 163 loading 8 zero partition checkpoints for rank 2 loading 8 zero partition checkpoints for rank 92 loading 8 zero partition checkpoints for rank 149 loading 8 zero partition checkpoints for rank 217 loading 8 zero partition checkpoints for rank 88 loading 8 zero partition checkpoints for rank 39 loading 8 zero partition checkpoints for rank 69 loading 8 zero partition checkpoints for rank 78 loading 8 zero partition checkpoints for rank 199 loading 8 zero partition checkpoints for rank 155 loading 8 zero partition checkpoints for rank 176 loading 8 zero partition checkpoints for rank 22 loading 8 zero partition checkpoints for rank 49 loading 8 zero partition checkpoints for rank 86 loading 8 zero partition checkpoints for rank 34 loading 8 zero partition checkpoints for rank 93 loading 8 zero partition checkpoints for rank 102 loading 8 zero partition checkpoints for rank 142 loading 8 zero partition checkpoints for rank 56 loading 8 zero partition checkpoints for rank 223 loading 8 zero partition checkpoints for rank 160 loading 8 zero partition checkpoints for rank 145 loading 8 zero partition checkpoints for rank 179 loading 8 zero partition checkpoints for rank 45 loading 8 zero partition checkpoints for rank 159 loading 8 zero partition checkpoints for rank 185 loading 8 zero partition checkpoints for rank 113 loading 8 zero partition checkpoints for rank 177 loading 8 zero partition checkpoints for rank 183 loading 8 zero partition checkpoints for rank 118 loading 8 zero partition checkpoints for rank 71 loading 8 zero partition checkpoints for rank 57 loading 8 zero partition checkpoints for rank 18 loading 8 zero partition checkpoints for rank 141 loading 8 zero partition checkpoints for rank 122 loading 8 zero partition checkpoints for rank 194 loading 8 zero partition checkpoints for rank 222 loading 8 zero partition checkpoints for rank 64 loading 8 zero partition checkpoints for rank 162 loading 8 zero partition checkpoints for rank 35 loading 8 zero partition checkpoints for rank 29 loading 8 zero partition checkpoints for rank 20 loading 8 zero partition checkpoints for rank 191 loading 8 zero partition checkpoints for rank 46 loading 8 zero partition checkpoints for rank 126 loading 8 zero partition checkpoints for rank 21 loading 8 zero partition checkpoints for rank 190 loading 8 zero partition checkpoints for rank 65 loading 8 zero partition checkpoints for rank 23 loading 8 zero partition checkpoints for rank 3 loading 8 zero partition checkpoints for rank 6 loading 8 zero partition checkpoints for rank 197 loading 8 zero partition checkpoints for rank 226 loading 8 zero partition checkpoints for rank 244 loading 8 zero partition checkpoints for rank 121 loading 8 zero partition checkpoints for rank 167 loading 8 zero partition checkpoints for rank 252 loading 8 zero partition checkpoints for rank 123 loading 8 zero partition checkpoints for rank 241 loading 8 zero partition checkpoints for rank 4 loading 8 zero partition checkpoints for rank 227 loading 8 zero partition checkpoints for rank 47 loading 8 zero partition checkpoints for rank 251 loading 8 zero partition checkpoints for rank 17 loading 8 zero partition checkpoints for rank 165 loading 8 zero partition checkpoints for rank 242 loading 8 zero partition checkpoints for rank 253 loading 8 zero partition checkpoints for rank 224 loading 8 zero partition checkpoints for rank 250 loading 8 zero partition checkpoints for rank 231 loading 8 zero partition checkpoints for rank 229 loading 8 zero partition checkpoints for rank 245 loading 8 zero partition checkpoints for rank 230 loading 8 zero partition checkpoints for rank 228 loading 8 zero partition checkpoints for rank 255 loading 8 zero partition checkpoints for rank 0 checkpoint version 3.0 loading 8 zero partition checkpoints for rank 234 loading 8 zero partition checkpoints for rank 27 loading 8 zero partition checkpoints for rank 233 loading 8 zero partition checkpoints for rank 26 loading 8 zero partition checkpoints for rank 225 loading 8 zero partition checkpoints for rank 240 loading 8 zero partition checkpoints for rank 30 loading 8 zero partition checkpoints for rank 12 loading 8 zero partition checkpoints for rank 16 loading 8 zero partition checkpoints for rank 7 loading 8 zero partition checkpoints for rank 5 loading 8 zero partition checkpoints for rank 1 loading 8 zero partition checkpoints for rank 31 loading 8 zero partition checkpoints for rank 24 loading 8 zero partition checkpoints for rank 246 loading 8 zero partition checkpoints for rank 254 loading 8 zero partition checkpoints for rank 19 loading 8 zero partition checkpoints for rank 15 loading 8 zero partition checkpoints for rank 14 loading 8 zero partition checkpoints for rank 248 loading 8 zero partition checkpoints for rank 243 loading 8 zero partition checkpoints for rank 247 loading 8 zero partition checkpoints for rank 9 loading 8 zero partition checkpoints for rank 25 loading 8 zero partition checkpoints for rank 232 loading 8 zero partition checkpoints for rank 249 loading 8 zero partition checkpoints for rank 11 loading 8 zero partition checkpoints for rank 235 loading 8 zero partition checkpoints for rank 236 loading 8 zero partition checkpoints for rank 13 loading 8 zero partition checkpoints for rank 237 loading 8 zero partition checkpoints for rank 10 loading 8 zero partition checkpoints for rank 239 loading 8 zero partition checkpoints for rank 238 loading 8 zero partition checkpoints for rank 8 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints at iteration 6210 time (ms) | load-checkpoint: 56346.23 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-30 03:53:50 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 300000000 validation: 1638400 test: 10240 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.158410 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_300000000ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.256 seconds total number of samples: 394611670 total number of epochs: 3 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_1638400ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.235 seconds total number of samples: 6927161 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_shuffle_idx.npy loaded indexed file in 0.069 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-09-30 03:53:56 done with setup ... training ... time (ms) | model-and-optimizer-setup: 64448.88 | train/valid/test-data-iterators-setup: 5454.03 [before the start of training step] datetime: 2021-09-30 03:53:56 [2021-09-30 03:53:56,830] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information [2021-09-30 03:53:56,830] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-09-30 03:53:56,830] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 32 total layers [2021-09-30 03:53:56,830] [INFO] [checkpointing.py:415:forward] ----Synchronization False [2021-09-30 03:53:56,830] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False [Rank 225] (after 6220 iterations) memory (MB) | allocated: 7107.7119140625 | max allocated: 11917.68994140625 | reserved: 20752.0 | max reserved: 20752.0 [Rank 1] (after 6220 iterations) memory (MB) | allocated: 6689.83056640625 | max allocated: 13931.01416015625 | reserved: 23374.0 | max reserved: 23374.0 [Rank 0] (after 6220 iterations) memory (MB) | allocated: 6689.83056640625 | max allocated: 13931.01416015625 | reserved: 23310.0 | max reserved: 23310.0 [Rank 224] (after 6220 iterations) memory (MB) | allocated: 7107.7119140625 | max allocated: 11917.68994140625 | reserved: 22492.0 | max reserved: 22492.0 [Rank 226] (after 6220 iterations) memory (MB) | allocated: 7107.7119140625 | max allocated: 11917.68994140625 | reserved: 22492.0 | max reserved: 22492.0 [Rank 2] (after 6220 iterations) memory (MB) | allocated: 6689.83056640625 | max allocated: 13931.01416015625 | reserved: 22974.0 | max reserved: 22974.0 [Rank 3] (after 6220 iterations) memory (MB) | allocated: 6689.83056640625 | max allocated: 13931.01416015625 | reserved: 22990.0 | max reserved: 22990.0 [Rank 227] (after 6220 iterations) memory (MB) | allocated: 7107.7119140625 | max allocated: 11917.68994140625 | reserved: 20752.0 | max reserved: 20752.0 iteration 6220/ 159576 | consumed samples: 194400 | elapsed time per iteration (ms): 30069.8 | learning rate: 5.378E-05 | global batch size: 80 | lm loss: 6.355436E+00 | loss scale: 4096.0 | grad norm: 132438.701 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [Rank 32] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12114.4677734375 | reserved: 20596.0 | max reserved: 20596.0 [Rank 98] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11570.466796875 | reserved: 19870.0 | max reserved: 19870.0 [Rank 130] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11298.46630859375 | reserved: 19434.0 | max reserved: 19434.0 [Rank 162] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11026.4658203125 | reserved: 19066.0 | max reserved: 19066.0 [Rank 66] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11842.46728515625 | reserved: 20670.0 | max reserved: 20670.0 [Rank 194] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10754.46533203125 | reserved: 19054.0 | max reserved: 19054.0 [Rank 34] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12114.4677734375 | reserved: 20170.0 | max reserved: 20170.0 [Rank 96] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11570.466796875 | reserved: 20388.0 | max reserved: 20388.0 [Rank 128] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11298.46630859375 | reserved: 19988.0 | max reserved: 19988.0 [Rank 192] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10754.46533203125 | reserved: 19480.0 | max reserved: 19480.0 [Rank 160] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11026.4658203125 | reserved: 19572.0 | max reserved: 19572.0 [Rank 64] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11842.46728515625 | reserved: 20452.0 | max reserved: 20452.0 [Rank 67] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11842.46728515625 | reserved: 20062.0 | max reserved: 20062.0 [Rank 35] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12114.4677734375 | reserved: 20374.0 | max reserved: 20374.0 [Rank 163] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11026.4658203125 | reserved: 19226.0 | max reserved: 19226.0 [Rank 195] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10754.46533203125 | reserved: 18970.0 | max reserved: 18970.0 [Rank 131] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11298.46630859375 | reserved: 19582.0 | max reserved: 19582.0 [Rank 99] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11570.466796875 | reserved: 19886.0 | max reserved: 19886.0 [Rank 97] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11570.466796875 | reserved: 19870.0 | max reserved: 19870.0 [Rank 129] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11298.46630859375 | reserved: 19562.0 | max reserved: 19562.0 [Rank 65] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11842.46728515625 | reserved: 20670.0 | max reserved: 20670.0 [Rank 161] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 11026.4658203125 | reserved: 19230.0 | max reserved: 19230.0 [Rank 33] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 12114.4677734375 | reserved: 20170.0 | max reserved: 20170.0 [Rank 193] (after 6220 iterations) memory (MB) | allocated: 5861.55029296875 | max allocated: 10754.46533203125 | reserved: 19054.0 | max reserved: 19054.0 iteration 6230/ 159576 | consumed samples: 195200 | elapsed time per iteration (ms): 29715.9 | learning rate: 5.400E-05 | global batch size: 80 | lm loss: 6.325600E+00 | loss scale: 4096.0 | grad norm: 93189.900 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6240/ 159576 | consumed samples: 196000 | elapsed time per iteration (ms): 29850.6 | learning rate: 5.423E-05 | global batch size: 80 | lm loss: 6.314528E+00 | loss scale: 4096.0 | grad norm: 153013.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:10.154311 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:09.693372 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:09.853277 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:09.842906 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:10.156230 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:10.007043 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:10.141515 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:09.998821 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:09.978375 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:10.153948 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:10.046725 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:10.105339 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:09.839469 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:09.886734 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:09.846006 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:10.129153 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:10.156799 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:09.917236 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:10.199060 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:10.036142 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:10.135256 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:10.068995 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:10.104105 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:09.945723 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:09.807536 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:09.878620 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:09.885677 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:09.967674 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:09.949363 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:10.167258 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:10.095681 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:09.905876 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:10.070411 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:10.200017 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:10.000211 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:09.878945 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:10.053487 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:09.966731 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:09.867150 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:11:21 CEST)" was missed by 0:00:09.998094 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.832527 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.900956 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.959521 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.693701 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.740910 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.700204 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:09.008585 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.547640 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.707561 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:09.053319 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.861273 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.995798 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:09.054171 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.853107 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.890343 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.821832 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.989445 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:09.008173 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.923175 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.983410 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:09.011032 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.771456 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.732863 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:09.010526 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.854397 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.760050 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.803591 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.949886 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:09.021500 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.958312 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.799978 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.820922 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.661789 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.697242 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.739874 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.924615 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.907646 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.733121 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.721307 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:12:21 CEST)" was missed by 0:00:08.852267 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.436041 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.611707 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.526661 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.563051 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.297217 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.344454 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.303732 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.612125 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.614514 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.151174 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.311093 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.656833 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.614044 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.464781 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.599319 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.657678 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.456642 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.493848 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.363548 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.425338 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.528064 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.592991 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.553354 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.624952 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.561765 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.504510 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.403512 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.424427 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.586926 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.265296 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.300740 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.375030 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.336389 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.343398 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.457902 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.511153 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.407115 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.336631 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.324806 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:13:21 CEST)" was missed by 0:00:07.455805 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.805411 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.673073 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.981482 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.520498 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.680458 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:05.026200 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.834157 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.968697 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:05.027056 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.825991 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.863229 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.794725 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.962332 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.981080 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.896079 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.922735 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.994312 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.873866 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.932435 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.666589 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.713819 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.793773 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.956314 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.983905 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.634635 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.670132 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.744374 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.983402 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.827251 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.732936 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.897469 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.776498 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.931203 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.772888 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.705977 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.694159 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.705781 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.712784 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.880530 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:14:21 CEST)" was missed by 0:00:04.825162 iteration 6250/ 159576 | consumed samples: 196800 | elapsed time per iteration (ms): 29009.9 | learning rate: 5.445E-05 | global batch size: 80 | lm loss: 6.303601E+00 | loss scale: 4096.0 | grad norm: 137433.627 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6260/ 159576 | consumed samples: 197600 | elapsed time per iteration (ms): 28924.6 | learning rate: 5.467E-05 | global batch size: 80 | lm loss: 6.323338E+00 | loss scale: 4096.0 | grad norm: 108774.777 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6270/ 159576 | consumed samples: 198400 | elapsed time per iteration (ms): 29624.6 | learning rate: 5.489E-05 | global batch size: 80 | lm loss: 6.321053E+00 | loss scale: 4096.0 | grad norm: 100365.177 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6280/ 159576 | consumed samples: 199200 | elapsed time per iteration (ms): 29739.7 | learning rate: 5.511E-05 | global batch size: 80 | lm loss: 6.322646E+00 | loss scale: 4096.0 | grad norm: 175808.123 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:32:21 CEST)" was missed by 0:00:10.560016 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:32:21 CEST)" was missed by 0:00:10.719904 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:32:21 CEST)" was missed by 0:00:10.709566 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:32:21 CEST)" was missed by 0:00:10.783853 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:32:21 CEST)" was missed by 0:00:10.752231 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:32:21 CEST)" was missed by 0:00:10.772344 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:32:21 CEST)" was missed by 0:00:10.712575 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:32:21 CEST)" was missed by 0:00:10.745258 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:32:21 CEST)" was missed by 0:00:10.812421 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:32:21 CEST)" was missed by 0:00:10.745427 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:32:21 CEST)" was missed by 0:00:10.674153 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:32:21 CEST)" was missed by 0:00:10.753402 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:32:21 CEST)" was missed by 0:00:10.733674 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.210198 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.324252 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.370124 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.434048 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.462579 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.359781 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.395448 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.402439 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.422610 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.403588 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.362847 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.515743 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.395686 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.658400 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.646049 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.671311 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.673678 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.670851 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.466245 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.622282 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.356399 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.383884 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.495313 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.563735 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.673182 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.716023 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.523979 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.484543 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.620971 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.716913 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.585955 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.612592 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.553056 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.587290 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.652254 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.684235 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.517143 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.483674 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.570397 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:33:21 CEST)" was missed by 0:00:08.515050 iteration 6290/ 159576 | consumed samples: 200000 | elapsed time per iteration (ms): 28954.3 | learning rate: 5.534E-05 | global batch size: 80 | lm loss: 6.273073E+00 | loss scale: 4096.0 | grad norm: 97252.912 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.288614 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.464212 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.149769 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.467041 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.003718 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.317340 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.451823 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.510241 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.309182 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.346390 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.380586 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.259652 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.379270 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.405920 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.414289 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.357097 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.415641 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.256107 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.439463 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.464737 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.163626 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.117813 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.153313 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.227531 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.195935 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.509440 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.466579 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.310490 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.216089 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.277924 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.477554 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.197087 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.156313 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.276976 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.188987 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.445633 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.189215 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.177346 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.308340 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:34:21 CEST)" was missed by 0:00:06.363687 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.648136 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.611939 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.346096 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.635765 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.663381 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.200096 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.314108 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.423849 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.705723 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.513679 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.505496 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.542720 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.484985 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.660576 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.455982 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.575642 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.602261 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.553415 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.393356 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.452439 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.661072 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.360007 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.385266 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.392242 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.706628 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.412466 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.474268 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.577013 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.641928 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.673914 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.352700 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.349662 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.662961 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.506846 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.610729 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.473366 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.385525 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.560141 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.373783 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:35:21 CEST)" was missed by 0:00:04.504787 iteration 6300/ 159576 | consumed samples: 200800 | elapsed time per iteration (ms): 29277.3 | learning rate: 5.556E-05 | global batch size: 80 | lm loss: 6.295372E+00 | loss scale: 4096.0 | grad norm: 153910.916 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 6300 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-30 04:38:25,123] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step6300/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 6300 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 18667.92 [2021-09-30 04:42:10] PULSE: tr8-104B is running for 50:03 since 2021-09-30T03:52:07 (1289770 on 'gpu_p13' partition (r6i4n[5-6,8],r6i5n[4-5],r7i0n[5-8],r7i1n0,r8i2n8,r8i4n1,r8i7n[3-8],r9i0n[0-8],r9i1n[0-8],r9i2n[3-8],r9i3n[7-8],r9i4n[0-2],r9i5n[2,5-7],r9i6n[2-8],r14i7n[1-6]) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:10.963852 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:11.090832 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:11.142243 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:10.678945 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:10.864251 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:10.828453 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:10.902679 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:10.871111 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:10.992544 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:11.127072 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:10.985620 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:10.984376 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:10.891238 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:10.953068 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:11.139466 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:10.934822 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:11.054483 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:11.152760 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:11.089548 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:11.032265 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:10.831478 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:10.931277 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:11.114704 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:11.139945 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:10.838871 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:10.793026 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:11.184644 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:11.141780 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:11.185417 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:11.021588 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:11.120823 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:11.081177 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:10.825079 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:10.872282 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:10.952198 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:10.864201 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:11.038919 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:11.055901 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:10.983584 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:43:21 CEST)" was missed by 0:00:10.852635 iteration 6310/ 159576 | consumed samples: 201600 | elapsed time per iteration (ms): 31055.8 | learning rate: 5.578E-05 | global batch size: 80 | lm loss: 6.324059E+00 | loss scale: 4096.0 | grad norm: 124591.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.738472 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.916856 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.453527 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.767151 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.901695 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.960038 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.760247 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.759012 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.796167 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.665853 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.914081 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.709466 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.864119 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.806929 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.865490 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.705922 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.889311 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.914550 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.638906 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.567640 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.603094 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.677329 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.959238 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.916405 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.727715 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.830442 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.813480 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.829148 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.927410 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.599690 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.646915 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.606165 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.726887 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.613497 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.627127 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.638827 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.645783 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.895467 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.758118 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:44:21 CEST)" was missed by 0:00:09.855861 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.645084 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.444049 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.350932 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.423550 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.601918 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.138600 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.288152 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.362393 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.601444 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.452243 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.586762 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.481262 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.515485 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.599178 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.394550 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.514136 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.540861 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.612416 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.549186 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.491972 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.550552 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.284746 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.291169 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.391012 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.574386 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.599623 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.323987 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.298543 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.252696 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.323868 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.644302 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.445363 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.412804 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.580514 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.498539 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.331965 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.411902 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.312190 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.330874 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:45:21 CEST)" was missed by 0:00:07.443200 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:06.041857 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.578500 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:06.026635 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:06.085043 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.885229 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.883961 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.790857 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.863493 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.931870 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.990495 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:06.014268 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:06.039506 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.692615 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.728070 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.802336 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:06.084205 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:06.041371 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.892144 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.921173 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.852736 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.955440 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.938470 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:06.039078 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.834459 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.954112 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.980806 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:06.052355 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.989123 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.724696 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.731123 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.830931 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.763914 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.738478 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.770788 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.883124 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.771905 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.851850 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.752137 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:05.763825 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:46:21 CEST)" was missed by 0:00:06.020483 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:47:21 CEST)" was missed by 0:00:03.102876 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:47:21 CEST)" was missed by 0:00:03.064855 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:47:21 CEST)" was missed by 0:00:03.092455 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:47:21 CEST)" was missed by 0:00:03.134795 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:47:21 CEST)" was missed by 0:00:03.077252 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:47:21 CEST)" was missed by 0:00:03.135608 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:47:21 CEST)" was missed by 0:00:03.089670 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:47:21 CEST)" was missed by 0:00:03.004667 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:47:21 CEST)" was missed by 0:00:03.039735 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:47:21 CEST)" was missed by 0:00:03.041067 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:47:21 CEST)" was missed by 0:00:03.090114 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:47:21 CEST)" was missed by 0:00:03.091998 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:47:21 CEST)" was missed by 0:00:03.006044 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:47:21 CEST)" was missed by 0:00:03.031397 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:47:21 CEST)" was missed by 0:00:03.071041 iteration 6320/ 159576 | consumed samples: 202400 | elapsed time per iteration (ms): 28833.6 | learning rate: 5.600E-05 | global batch size: 80 | lm loss: 6.299813E+00 | loss scale: 4096.0 | grad norm: 122818.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6330/ 159576 | consumed samples: 203200 | elapsed time per iteration (ms): 29174.3 | learning rate: 5.622E-05 | global batch size: 80 | lm loss: 6.322478E+00 | loss scale: 4096.0 | grad norm: 120418.031 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.253093 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.408180 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.171416 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.203590 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.300998 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.359582 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.199992 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.395825 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.358264 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.383438 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.411079 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:10.947749 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.061786 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.097254 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.232696 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.389556 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.093801 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.453365 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.454255 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.307618 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.141045 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.410560 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.261363 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.324609 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.323301 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.349991 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.421554 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.100335 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.408756 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.132960 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.254470 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.290414 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.160106 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.221951 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.221009 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.133140 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.107698 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.121276 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.139973 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 04:58:21 CEST)" was missed by 0:00:11.252277 iteration 6340/ 159576 | consumed samples: 204000 | elapsed time per iteration (ms): 29051.4 | learning rate: 5.645E-05 | global batch size: 80 | lm loss: 6.316248E+00 | loss scale: 4096.0 | grad norm: 133284.538 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6350/ 159576 | consumed samples: 204800 | elapsed time per iteration (ms): 27117.2 | learning rate: 5.664E-05 | global batch size: 80 | lm loss: 6.308830E+00 | loss scale: 4096.0 | grad norm: 104470.093 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.858272 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.906190 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.964757 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.776652 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:11.013436 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.808771 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.963400 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.805240 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.988573 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:11.016220 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.552907 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.702416 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.866489 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:11.001012 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:11.059408 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.765238 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.929737 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.994718 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.912737 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.955145 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:11.026697 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.699024 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.705453 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:11.013895 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.738255 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.712830 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.666999 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.726413 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.745120 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:11.058558 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:11.015748 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.859628 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.895580 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.837860 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.827115 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.857389 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.928492 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.746221 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.826172 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:03:21 CEST)" was missed by 0:00:10.738144 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.544306 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.654291 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.496484 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.475942 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.651557 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.593186 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.664772 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.602901 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.343552 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.652003 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.191015 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.414785 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.504596 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.639151 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.697521 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.497705 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.533664 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.403327 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.446910 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.566575 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.443386 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.464261 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.626770 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.305111 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.340594 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.383219 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.696698 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.653882 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.465197 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.601646 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.337182 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.384368 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.376410 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.350991 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.376308 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.632900 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.550987 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.567997 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.364638 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:04:21 CEST)" was missed by 0:00:09.495635 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.357490 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.309581 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.477886 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.414695 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.467452 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.227950 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.510663 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.216477 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.464731 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.416080 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.156737 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.004174 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.118245 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.317759 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.452289 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.310866 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.289131 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.278334 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.381042 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.364036 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.260088 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.379750 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.406383 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.150297 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.256564 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.439936 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.153747 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.177686 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.196389 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.509855 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.467049 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.346837 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.197555 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.277451 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.465213 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.189563 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.164154 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.189452 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.446079 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:05:21 CEST)" was missed by 0:00:06.308727 iteration 6360/ 159576 | consumed samples: 205600 | elapsed time per iteration (ms): 28711.8 | learning rate: 5.687E-05 | global batch size: 80 | lm loss: 6.295247E+00 | loss scale: 4096.0 | grad norm: 141519.847 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6370/ 159576 | consumed samples: 206400 | elapsed time per iteration (ms): 28608.5 | learning rate: 5.709E-05 | global batch size: 80 | lm loss: 6.339661E+00 | loss scale: 4096.0 | grad norm: 73871.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6380/ 159576 | consumed samples: 207200 | elapsed time per iteration (ms): 26950.3 | learning rate: 5.729E-05 | global batch size: 80 | lm loss: 6.321135E+00 | loss scale: 2048.0 | grad norm: 41452.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.127092 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.284941 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:09.821696 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.282250 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.175038 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.233611 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.045497 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.269806 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.328219 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.074080 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.106672 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.077633 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:09.971266 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.013910 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.327394 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.034037 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.181614 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.223948 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:09.974261 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.094959 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.007096 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.135321 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.128424 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.164398 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.197302 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.295558 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.232300 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:09.935843 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.284569 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.095933 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:09.967880 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.015111 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.257525 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.282764 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:09.981679 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:09.995282 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.198646 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.263681 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.126272 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:18:21 CEST)" was missed by 0:00:10.007079 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.781375 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.887890 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.939285 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.924099 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.982523 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.829374 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.911768 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.699826 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.688354 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.760979 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.936604 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.628560 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.476055 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.590106 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.625579 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.981678 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.782743 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.750191 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.949848 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.886630 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.622139 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.937037 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.661413 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.668239 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.789640 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.818720 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.852942 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.835935 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.731984 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.851616 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.878300 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.669372 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.728429 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.636000 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.661290 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.938898 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.749334 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.649610 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.917994 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:19:21 CEST)" was missed by 0:00:07.780626 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.460131 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.602843 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.566623 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.618028 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.268809 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.378539 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.660409 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.661244 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.367078 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.615326 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.565330 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.508126 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.154798 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.468365 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.531660 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.628567 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.300865 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.590528 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.304329 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.461484 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.439752 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.514683 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.410697 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.530338 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.557013 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.348089 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.407159 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.340154 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.497459 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.428966 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.307342 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.428046 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.615835 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.314754 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.328329 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.340010 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.347008 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.617685 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.596693 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:20:21 CEST)" was missed by 0:00:05.459345 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:21:21 CEST)" was missed by 0:00:03.022709 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:21:21 CEST)" was missed by 0:00:03.081160 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:21:21 CEST)" was missed by 0:00:03.035233 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:21:21 CEST)" was missed by 0:00:03.010417 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:21:21 CEST)" was missed by 0:00:03.037985 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:21:21 CEST)" was missed by 0:00:03.080305 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:21:21 CEST)" was missed by 0:00:03.048494 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:21:21 CEST)" was missed by 0:00:03.035719 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:21:21 CEST)" was missed by 0:00:03.037563 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:21:21 CEST)" was missed by 0:00:03.016621 iteration 6390/ 159576 | consumed samples: 208000 | elapsed time per iteration (ms): 28738.4 | learning rate: 5.751E-05 | global batch size: 80 | lm loss: 6.297319E+00 | loss scale: 2048.0 | grad norm: 49955.717 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6400/ 159576 | consumed samples: 208800 | elapsed time per iteration (ms): 28798.3 | learning rate: 5.773E-05 | global batch size: 80 | lm loss: 6.308155E+00 | loss scale: 2048.0 | grad norm: 31443.708 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.866637 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.973111 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:11.067688 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.773515 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:11.021771 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:11.024572 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.971804 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.813607 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.561285 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.746570 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.785025 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.835384 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.936780 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.914595 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.713813 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:11.009372 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.867941 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.817157 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.963453 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.938147 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:11.035073 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.707364 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.754582 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.834514 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:11.022270 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.746466 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.753489 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:11.066959 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.874872 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.846270 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:11.003145 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.997054 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.903979 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.921190 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.721234 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.675424 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.865840 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.710870 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:11.024162 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:31:21 CEST)" was missed by 0:00:10.734891 iteration 6410/ 159576 | consumed samples: 209600 | elapsed time per iteration (ms): 29052.9 | learning rate: 5.795E-05 | global batch size: 80 | lm loss: 6.333957E+00 | loss scale: 2048.0 | grad norm: 36283.153 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:07.902182 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:08.008655 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:08.103251 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:07.809071 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:08.057324 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:08.007316 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:07.950103 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:08.060133 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:07.820554 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:08.044905 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:07.973643 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:08.070592 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:07.742884 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:07.790092 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:07.849158 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:08.032568 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:07.596833 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:07.782123 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:07.903482 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:07.870959 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:07.972352 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:07.999021 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:07.710912 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:07.782009 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:08.102484 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:07.910425 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:07.881801 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:08.038661 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:07.956677 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:07.852732 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:07.749384 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:08.057839 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:07.789048 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:07.939542 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:07.901329 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:07.870072 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:07.756790 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:07.746450 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:07.770373 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:32:21 CEST)" was missed by 0:00:08.059749 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.261224 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.367652 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.149050 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.179549 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.462273 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.262494 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.168109 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.416338 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.331333 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.429591 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.366357 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.309130 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.101867 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.108317 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.208158 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.391598 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.416796 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.419126 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:05.955827 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.141140 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.461512 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.269432 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.403924 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.240782 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.229985 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.332689 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.397643 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.211702 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.358017 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.229062 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.115739 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.069951 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.105406 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.140999 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.148044 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.418710 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.298508 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.260372 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.315738 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:33:21 CEST)" was missed by 0:00:06.129395 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.156218 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.311333 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.262661 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.074555 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.298888 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.357289 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.157474 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.063079 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.227650 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.226345 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.324619 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.261358 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.204166 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.044083 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.286576 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.314156 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.036132 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.356482 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.135793 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.124953 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.106737 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.253046 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.003409 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.103194 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.311822 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.010767 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.036014 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.164463 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.193527 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.292692 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.155377 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.210747 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.000434 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.024387 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.043073 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.313742 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:34:21 CEST)" was missed by 0:00:03.124127 iteration 6420/ 159576 | consumed samples: 210400 | elapsed time per iteration (ms): 28687.3 | learning rate: 5.818E-05 | global batch size: 80 | lm loss: 6.311902E+00 | loss scale: 2048.0 | grad norm: 48812.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6430/ 159576 | consumed samples: 211200 | elapsed time per iteration (ms): 28644.5 | learning rate: 5.840E-05 | global batch size: 80 | lm loss: 6.339233E+00 | loss scale: 2048.0 | grad norm: 73811.722 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.672342 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.623708 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.435576 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.517243 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.467731 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.565152 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.675177 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.211888 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.659972 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.685664 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.464197 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.717515 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.587408 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.622460 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.672873 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.325988 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.404075 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.718366 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.518581 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.424191 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.486052 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.614086 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.405179 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.485113 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.397237 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.361446 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.397077 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.525503 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.496853 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.653741 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.357965 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.364440 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.647656 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.554588 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.588789 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.571800 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.371846 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.516442 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.385503 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:42:21 CEST)" was missed by 0:00:09.674794 [2021-09-30 05:42:11] PULSE: tr8-104B is running for 1:50:04 since 2021-09-30T03:52:07 (1289770 on 'gpu_p13' partition (r6i4n[5-6,8],r6i5n[4-5],r7i0n[5-8],r7i1n0,r8i2n8,r8i4n1,r8i7n[3-8],r9i0n[0-8],r9i1n[0-8],r9i2n[3-8],r9i3n[7-8],r9i4n[0-2],r9i5n[2,5-7],r9i6n[2-8],r14i7n[1-6]) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:08.180463 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:08.223619 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:08.022614 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:07.929490 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:08.177728 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:08.092656 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:08.190934 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:08.070528 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:08.129060 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:07.940945 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:07.909354 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:08.030754 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:08.165341 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:08.023859 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:08.002113 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:07.973093 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:08.119370 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:07.863236 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:07.910504 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:07.869706 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:07.990408 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:07.717225 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:07.902538 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:07.831310 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:08.222856 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:08.059851 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:07.991364 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:08.127787 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:07.969580 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:08.152986 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:08.178219 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:07.866781 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:07.902411 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:08.180104 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:08.094123 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:08.159064 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:07.877185 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:08.077150 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:07.890794 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:43:21 CEST)" was missed by 0:00:08.021810 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.229227 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.280684 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.041113 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.122834 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.277886 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.291169 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.170674 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:07.963441 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.278344 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:07.817381 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.323869 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.102340 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.073263 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.192912 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.219576 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.227965 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:07.969932 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.069726 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:07.966936 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.002578 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.009560 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.130984 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.265562 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.124098 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.160072 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.029742 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.259255 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.010692 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.090648 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.002775 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:07.977333 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:07.931537 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.323081 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.091595 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.194312 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.177336 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.253227 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.280309 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:07.990999 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:44:21 CEST)" was missed by 0:00:08.122000 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:07.177887 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:06.976881 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:07.083312 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:07.134791 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:06.978141 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:06.883753 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:07.132026 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:07.046995 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:07.145235 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:06.856786 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:06.895252 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:07.119624 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:07.014141 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:06.956423 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:07.073656 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:07.024838 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:06.864778 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:06.824002 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:06.944703 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:06.671503 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:06.785593 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:06.863666 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:06.985082 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:06.945648 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:06.927420 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:07.082100 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:06.817542 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:06.923871 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:07.132473 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:06.821051 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:07.177184 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:07.113364 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:07.107290 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:06.831454 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:06.856711 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:07.048442 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:07.031459 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:07.134413 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:06.976133 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:45:21 CEST)" was missed by 0:00:06.845142 iteration 6440/ 159576 | consumed samples: 212000 | elapsed time per iteration (ms): 29237.2 | learning rate: 5.862E-05 | global batch size: 80 | lm loss: 6.297226E+00 | loss scale: 2048.0 | grad norm: 57023.083 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.821089 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.620113 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.526937 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.668000 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.726551 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.777991 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.499986 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.538444 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.762838 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.621338 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.599598 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.691526 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.775209 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.690197 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.716868 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.788454 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.725208 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.460744 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.507986 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.775660 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.314703 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.464247 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.506861 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.820373 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.628281 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.657344 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.588848 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.756563 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.674581 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.570606 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.467226 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.567060 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.587930 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.750513 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.474643 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.428814 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.488222 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.499905 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.777603 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:46:21 CEST)" was missed by 0:00:05.619228 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.443264 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.537854 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.243717 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.505190 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.494780 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.216720 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.255198 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.338103 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.336886 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.491985 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.406954 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.433607 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.384789 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.177502 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.374091 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.316395 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.224720 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.183976 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.304654 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.031474 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.223627 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.479611 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.305601 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.442032 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.283835 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.492456 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.145578 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.181027 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.216640 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.537148 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.345061 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.408358 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.473311 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.391385 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.287398 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.467266 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.191442 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.494383 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.205054 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:47:21 CEST)" was missed by 0:00:05.336036 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.943631 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.894989 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.946453 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.706859 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.989577 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.788607 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.695460 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.768083 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.739036 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.858653 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.885315 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.956937 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.836440 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.629192 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.483161 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.668458 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.675316 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.796734 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.931317 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.789836 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.825799 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.924978 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.676445 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.635683 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.735483 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.756389 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.944150 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.643097 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.597297 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.632693 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.668333 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.988839 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.946038 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.757331 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.893808 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.918963 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.860116 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.843136 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.787763 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:48:21 CEST)" was missed by 0:00:05.656820 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.416283 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.321712 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.373167 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.133602 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.216536 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.215299 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.252499 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.122141 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.194800 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.370383 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.285357 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.312010 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.383612 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.320396 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.263184 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.055936 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.103158 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.062370 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:03.909867 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.095176 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.059395 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.102030 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.415569 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.223472 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.358015 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.184031 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.286724 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.351724 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.269751 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.165786 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.162246 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.183079 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.345690 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.370865 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.069793 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.023995 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.083416 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.095087 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.372765 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 05:49:21 CEST)" was missed by 0:00:04.214409 iteration 6450/ 159576 | consumed samples: 212800 | elapsed time per iteration (ms): 29595.0 | learning rate: 5.884E-05 | global batch size: 80 | lm loss: 6.299403E+00 | loss scale: 2048.0 | grad norm: 65910.989 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6460/ 159576 | consumed samples: 213600 | elapsed time per iteration (ms): 29891.0 | learning rate: 5.906E-05 | global batch size: 80 | lm loss: 6.318707E+00 | loss scale: 2048.0 | grad norm: 76118.048 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6470/ 159576 | consumed samples: 214400 | elapsed time per iteration (ms): 29671.9 | learning rate: 5.929E-05 | global batch size: 80 | lm loss: 6.299670E+00 | loss scale: 2048.0 | grad norm: 59518.850 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6480/ 159576 | consumed samples: 215200 | elapsed time per iteration (ms): 29322.0 | learning rate: 5.951E-05 | global batch size: 80 | lm loss: 6.325890E+00 | loss scale: 2048.0 | grad norm: 50644.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6490/ 159576 | consumed samples: 216000 | elapsed time per iteration (ms): 30024.9 | learning rate: 5.973E-05 | global batch size: 80 | lm loss: 6.311376E+00 | loss scale: 2048.0 | grad norm: 71729.082 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6500/ 159576 | consumed samples: 216800 | elapsed time per iteration (ms): 30086.9 | learning rate: 5.995E-05 | global batch size: 80 | lm loss: 6.319954E+00 | loss scale: 2048.0 | grad norm: 50618.616 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6510/ 159576 | consumed samples: 217600 | elapsed time per iteration (ms): 29295.7 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.319446E+00 | loss scale: 2048.0 | grad norm: 59473.520 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6520/ 159576 | consumed samples: 218400 | elapsed time per iteration (ms): 28023.0 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.312102E+00 | loss scale: 1024.0 | grad norm: 43424.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6530/ 159576 | consumed samples: 219200 | elapsed time per iteration (ms): 30083.9 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.307513E+00 | loss scale: 1024.0 | grad norm: 64246.025 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6540/ 159576 | consumed samples: 220000 | elapsed time per iteration (ms): 30333.9 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.300389E+00 | loss scale: 1024.0 | grad norm: 34309.791 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.610783 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.410971 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.378423 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.567730 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.552497 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.409822 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.447014 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.316657 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.389328 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.578167 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.516270 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.356717 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.104430 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.289729 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.296536 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.610069 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.564929 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.479921 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.506582 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.514983 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.457765 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.250448 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.256936 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.377617 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.253944 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.546273 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.360338 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.565404 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.218538 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.418015 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.481307 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.540199 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.328250 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.464346 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.297774 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.264372 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.289624 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.278015 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.567323 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:39:21 CEST)" was missed by 0:00:03.409016 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:05.095482 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:04.952818 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:05.059313 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:04.899754 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:05.110796 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:04.647445 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:05.153040 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:05.153892 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:04.990078 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:04.859763 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:04.921565 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:05.107941 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:05.121256 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:05.000743 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:05.083176 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:05.108389 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:04.761512 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:04.871222 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:04.961059 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:04.954152 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:04.932394 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:04.903320 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:04.793537 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:04.807391 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:04.797012 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:04.839636 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:05.089321 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:05.023048 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:05.049684 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:05.058101 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:04.840790 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:04.800054 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:04.832840 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:04.832669 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:05.110348 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:04.920745 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:05.024430 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:05.007458 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:04.821121 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:40:21 CEST)" was missed by 0:00:04.952120 iteration 6550/ 159576 | consumed samples: 220800 | elapsed time per iteration (ms): 30501.9 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.314403E+00 | loss scale: 1024.0 | grad norm: 30470.176 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.386905 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.185872 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.292335 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.343830 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.386123 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.328562 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.187159 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.223091 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.092790 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.165411 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.154586 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.256025 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.354264 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.132836 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:05.994559 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.072631 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.194078 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.341067 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.282696 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.233863 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.026563 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.033046 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.153726 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.316230 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:05.880554 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.065862 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.030086 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.322360 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.136434 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.291114 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.073845 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.341529 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.104330 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.065727 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.257448 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.040491 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.240485 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.343433 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.054128 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:41:21 CEST)" was missed by 0:00:06.185134 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.841964 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.640933 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.609632 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.841160 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.783632 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.642205 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.547828 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.796103 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.591443 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.711090 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.737742 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.809317 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.688893 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.488084 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.587905 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.608774 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.771273 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.335599 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.449629 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.559354 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.746142 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.796555 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.520906 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.485129 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.712474 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.495539 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.695523 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.798489 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.640168 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.509185 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.747693 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.799182 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.527999 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.678473 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.620776 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.481930 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.649452 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.529168 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.777713 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:42:21 CEST)" was missed by 0:00:06.521077 [2021-09-30 06:42:15] PULSE: tr8-104B is running for 2:50:08 since 2021-09-30T03:52:07 (1289770 on 'gpu_p13' partition (r6i4n[5-6,8],r6i5n[4-5],r7i0n[5-8],r7i1n0,r8i2n8,r8i4n1,r8i7n[3-8],r9i0n[0-8],r9i1n[0-8],r9i2n[3-8],r9i3n[7-8],r9i4n[0-2],r9i5n[2,5-7],r9i6n[2-8],r14i7n[1-6]) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.661339 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.767808 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.819301 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.861590 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.804060 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.862410 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.662600 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.698576 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.568258 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.640896 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.630060 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.731476 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.829712 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.502001 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.608311 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.505518 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.548100 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.669567 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.816541 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.758153 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.766550 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.709320 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.508511 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.629207 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.791725 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.816985 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.356025 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.541348 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.470092 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.579756 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.732875 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.797822 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.715899 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.611889 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.549298 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.515939 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.541203 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.818876 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.529570 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:43:21 CEST)" was missed by 0:00:06.660572 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:07.185299 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:07.243652 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:07.042616 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:07.149108 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:07.200577 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:07.242847 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:07.043914 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:07.079867 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:06.949531 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:07.022166 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:07.011341 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:07.197798 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:07.090584 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:06.883295 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:06.989590 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:06.737295 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:06.851317 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:06.961040 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:06.929405 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:07.050849 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:06.993154 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:07.112804 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:07.211059 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:07.147850 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:06.930571 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:07.172989 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:06.922626 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:06.886820 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:07.114170 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:07.179112 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:07.139470 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:06.889840 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:07.010516 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:07.198262 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:06.922478 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:07.097194 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:06.897236 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:07.200171 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:06.910868 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:44:21 CEST)" was missed by 0:00:07.041867 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.710719 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.478403 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.616193 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.456627 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.318412 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.428087 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.709967 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.652435 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.510964 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.509725 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.546961 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.416597 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.489271 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.664863 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.678108 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.614875 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.557685 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.640092 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.667685 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.204374 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.389672 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.396487 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.517938 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.581198 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.564232 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.460224 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.579878 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.606548 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.350402 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.397660 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.356901 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.477602 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.665346 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.353901 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.389568 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.646178 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.364301 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.377887 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.667240 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:45:21 CEST)" was missed by 0:00:07.508910 iteration 6560/ 159576 | consumed samples: 221600 | elapsed time per iteration (ms): 30277.7 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.306966E+00 | loss scale: 1024.0 | grad norm: 27994.872 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.101563 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.239360 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.290825 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.333122 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.275614 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.333908 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.134140 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.132914 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.170101 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.039771 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.112406 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.180832 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:07.980029 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.079841 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:07.827544 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.012831 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:07.941577 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.051269 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.019638 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.141076 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.269351 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.288065 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.083403 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.203051 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.229694 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.301282 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.238132 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:07.973572 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.020849 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.100762 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.263260 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.288527 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:07.987462 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:07.977094 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.204458 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.012745 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.290414 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.187495 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.001168 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:46:21 CEST)" was missed by 0:00:08.132191 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.749619 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.549838 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.548609 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.495488 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.357278 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.466977 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.748844 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.691313 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.455484 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.517284 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.703740 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.618747 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.716998 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.596535 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.395754 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.704189 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.243230 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.428566 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.403123 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.392766 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.620106 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.499089 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.645409 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.653803 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.516474 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.678967 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.706096 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.603149 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.416793 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.547792 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.655654 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.707143 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.586415 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.528711 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.435949 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.389842 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.437077 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.429012 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.557420 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:47:21 CEST)" was missed by 0:00:10.685654 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.613708 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.672035 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.471028 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.377884 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.439700 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.626136 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.518930 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.577507 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.417929 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.628975 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.279686 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.389368 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.671274 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.479229 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.472275 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.508262 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.450548 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.421508 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.541163 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.567819 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.639412 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.311687 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.358955 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.318160 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.438910 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.601372 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.165680 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.350973 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.325582 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.315206 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.350839 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.357818 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.607512 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.576275 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.626655 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.542618 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.525630 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.628564 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.339306 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:48:21 CEST)" was missed by 0:00:12.470287 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:49:21 CEST)" was missed by 0:00:12.817252 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:49:21 CEST)" was missed by 0:00:12.871588 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:49:21 CEST)" was missed by 0:00:12.777234 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:49:21 CEST)" was missed by 0:00:12.839036 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:49:21 CEST)" was missed by 0:00:12.710960 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:49:21 CEST)" was missed by 0:00:12.679021 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:49:21 CEST)" was missed by 0:00:12.788724 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:49:21 CEST)" was missed by 0:00:12.820851 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:49:21 CEST)" was missed by 0:00:12.758227 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:49:21 CEST)" was missed by 0:00:12.565018 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:49:21 CEST)" was missed by 0:00:12.750316 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:49:21 CEST)" was missed by 0:00:12.724859 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:49:21 CEST)" was missed by 0:00:12.714510 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:49:21 CEST)" was missed by 0:00:12.750126 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:49:21 CEST)" was missed by 0:00:12.757142 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:49:21 CEST)" was missed by 0:00:12.878566 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:49:21 CEST)" was missed by 0:00:12.717509 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:49:21 CEST)" was missed by 0:00:12.838234 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:49:21 CEST)" was missed by 0:00:12.924915 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:49:21 CEST)" was missed by 0:00:12.738588 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 06:49:21 CEST)" was missed by 0:00:12.869558 iteration 6570/ 159576 | consumed samples: 222400 | elapsed time per iteration (ms): 30604.9 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.300595E+00 | loss scale: 1024.0 | grad norm: 26978.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6580/ 159576 | consumed samples: 223200 | elapsed time per iteration (ms): 30439.1 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.323712E+00 | loss scale: 1024.0 | grad norm: 23410.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6590/ 159576 | consumed samples: 224000 | elapsed time per iteration (ms): 30455.3 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.491868E+00 | loss scale: 1024.0 | grad norm: 23219.864 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6600/ 159576 | consumed samples: 224800 | elapsed time per iteration (ms): 30012.6 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.859193E+00 | loss scale: 1024.0 | grad norm: 21108.820 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) saving checkpoint at iteration 6600 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints [2021-09-30 07:05:19,729] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step6600/mp_rank_00_model_states.pt successfully saved checkpoint at iteration 6600 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints time (ms) | save-checkpoint: 17612.65 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.456136 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.655762 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.396401 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.243892 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.692010 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.750322 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.550542 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.549288 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.586486 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.528774 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.517991 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.704425 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.517093 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.707254 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.429235 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.467660 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.436030 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.749504 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.499768 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.717708 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.389955 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.496244 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.679634 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.704910 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.429073 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.557477 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.619463 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.646129 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.597281 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.357978 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.685747 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.437219 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.393475 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.706797 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.654570 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.417540 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.620872 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.403899 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.603910 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:07:21 CEST)" was missed by 0:00:10.548570 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:08.868069 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:09.042113 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:09.100442 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:08.900700 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:08.899443 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:08.936633 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:08.806305 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:08.878872 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:09.054544 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:08.947319 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:09.005901 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:08.740051 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:08.746546 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:09.029728 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:09.057357 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:08.594070 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:08.779352 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:08.817751 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:08.786137 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:09.099624 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:08.907608 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:08.849918 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:08.969573 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:09.067810 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:08.846341 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:08.867236 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:09.055034 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:08.708109 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:08.779202 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:08.970899 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:09.035879 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:08.953937 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:08.996253 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:09.004630 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:08.787332 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:08.753994 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:08.743633 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:08.767616 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:09.056929 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:08:21 CEST)" was missed by 0:00:08.898599 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.539910 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.552299 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.409620 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.564766 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.250266 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.328003 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.609852 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.610765 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.578058 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.457575 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.516156 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.218318 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.296402 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.446910 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.389188 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.360138 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.256834 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.356583 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.411002 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.316647 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.378415 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.479842 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.297568 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.377519 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.567670 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.104344 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.264231 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.289458 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.417883 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.481165 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.514884 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.565287 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.289676 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.546133 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.464195 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.506530 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.253886 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.277858 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.567176 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:09:21 CEST)" was missed by 0:00:08.408888 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.888715 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.901123 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.959539 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.758428 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.737963 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.913569 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.864927 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.599071 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.914011 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.567087 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.958651 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.759764 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.795726 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.727204 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.863604 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.806392 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.605628 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.705386 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.916457 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.453115 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.676830 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.665443 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.708948 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.828644 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.926885 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.646363 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.638453 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.613035 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.638250 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.645263 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.766692 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.829954 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.894925 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.726347 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.602664 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.915953 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.812992 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.855330 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.626669 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:10:21 CEST)" was missed by 0:00:07.757675 iteration 6610/ 159576 | consumed samples: 225600 | elapsed time per iteration (ms): 31558.1 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.826175E+00 | loss scale: 1024.0 | grad norm: 19041.763 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:08.892454 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:09.043306 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:08.831339 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:09.114048 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:08.914251 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:08.913020 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:08.950233 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:08.819905 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:08.881709 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:09.068115 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:08.983124 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:09.081386 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:08.960940 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:09.019497 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:08.753622 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:08.760142 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:08.859924 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:08.607642 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:08.792943 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:08.721688 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:08.799749 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:09.113237 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:08.921222 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:09.055727 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:08.984487 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:09.049449 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:08.967518 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:08.863485 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:09.009842 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:09.018187 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:08.800913 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:08.880858 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:09.068591 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:09.071005 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:08.767577 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:08.757201 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:08.781179 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:08.792817 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:09.070498 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:11:21 CEST)" was missed by 0:00:08.912182 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.549359 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.327811 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.478644 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.266665 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.349599 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.348330 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.385573 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.255230 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.317008 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.503444 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.516727 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.453466 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.396262 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.454788 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.188952 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.195469 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.295266 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.503895 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.506311 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.042955 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.228279 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.157020 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.216470 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.235097 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.548551 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.356561 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.491057 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.419791 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.484754 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.298835 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.418469 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.236229 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.316173 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.202878 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.192516 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.228131 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.505824 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.347494 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.402861 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:12:21 CEST)" was missed by 0:00:09.445183 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.360427 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.582014 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.380926 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.418193 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.287898 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.349668 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.536063 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.451081 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.549347 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.487434 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.221572 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.228064 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.511284 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.538909 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.075595 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.189640 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.299305 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.581198 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.523680 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.382246 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.331443 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.428899 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.327881 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.348810 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.536560 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.260927 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.235527 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.260768 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.267724 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.389170 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.517409 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.477788 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.268884 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.225171 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.538457 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.452497 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.486189 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.435519 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.249209 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:13:21 CEST)" was missed by 0:00:09.380204 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:09.126722 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:08.905163 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:09.080779 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:08.772810 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:09.055984 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:08.844016 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:09.125894 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:08.933869 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:09.068381 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:08.926948 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:08.925683 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:08.962921 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:08.832601 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:08.894353 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:08.876148 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:08.995807 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:09.094093 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:08.973622 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:09.032176 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:08.766304 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:08.872586 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:09.081233 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:09.083646 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:08.620303 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:08.805640 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:08.780224 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:08.734371 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:08.812455 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:08.997192 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:09.062128 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:08.980219 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:09.022518 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:09.030864 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:08.813610 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:08.893546 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:08.769886 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:08.793876 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:08.805506 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:09.083176 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:14:21 CEST)" was missed by 0:00:08.924904 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:10.032774 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.811248 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:10.000127 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.938197 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.672350 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.678823 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.962026 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.750083 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:10.031952 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.974434 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.833030 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.831713 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.868993 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.738660 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.800428 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.986859 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.782229 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.901850 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.879689 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.778664 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.987303 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.989730 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.526370 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.640420 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.839957 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.968153 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.928567 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.719636 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.799590 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.711720 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.686321 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.675926 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.699978 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.711526 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.718533 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.989243 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.936977 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.903298 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.886331 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:15:21 CEST)" was missed by 0:00:09.830998 iteration 6620/ 159576 | consumed samples: 226400 | elapsed time per iteration (ms): 30107.3 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.742154E+00 | loss scale: 1024.0 | grad norm: 28021.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.812963 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.613166 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.611893 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.591415 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.580602 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.682037 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.780296 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.718377 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.459023 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.742228 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.306542 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.530248 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.812149 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.754626 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.649181 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.518824 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.748355 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.767012 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.562410 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.659842 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.452541 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.558834 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.767479 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.769904 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.491873 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.420609 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.491713 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.498698 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.620139 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.683447 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.708758 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.717126 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.499844 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.579780 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.466488 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.456136 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.769425 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.666488 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.480138 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:16:21 CEST)" was missed by 0:00:09.611122 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:09.042710 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:09.193520 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:09.205923 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:09.264285 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:09.063187 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:08.970129 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:09.031954 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:09.133353 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:09.231597 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:09.111157 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:09.169685 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:08.903816 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:08.910327 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:09.218796 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:09.221197 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:08.757854 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:08.981603 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:08.949986 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:09.263455 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:09.064552 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:09.100469 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:09.218370 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:09.013749 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:08.951127 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:09.010167 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:08.943214 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:08.871913 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:09.199654 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:09.160084 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:09.031096 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:08.907422 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:08.943054 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:09.071465 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:08.917818 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:09.220732 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:09.168476 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:09.134794 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:09.117821 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:08.931480 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:17:21 CEST)" was missed by 0:00:09.062491 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:09.971501 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:10.160373 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:10.193064 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:09.993266 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:09.992029 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:10.029247 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:09.898924 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:09.960695 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:10.147139 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:10.062129 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:10.039949 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:10.098547 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:09.832689 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:09.839159 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:10.122367 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:10.149978 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:09.686676 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:09.910377 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:09.878776 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:10.192251 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:10.000206 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:10.134747 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:09.942518 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:09.938977 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:10.147636 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:09.871979 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:09.800747 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:10.063540 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:10.128497 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:10.088878 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:10.097228 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:09.879968 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:09.959893 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:09.846608 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:09.836262 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:09.871884 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:10.149548 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:09.991209 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:10.046582 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:18:21 CEST)" was missed by 0:00:09.860236 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:10.958685 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:11.109509 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:10.897487 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:11.180248 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:10.980470 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:10.979183 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:10.886121 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:10.947918 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:11.134258 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:10.929638 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:11.049325 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:11.147580 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:11.027088 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:11.085667 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:10.819797 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:10.826336 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:10.926082 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:11.137163 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:10.673839 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:10.865944 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:11.179430 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:11.121934 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:11.016425 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:10.867085 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:11.134793 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:10.787883 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:10.859010 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:10.987407 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:11.115634 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:10.859181 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:10.823391 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:11.076039 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:10.947084 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:11.136717 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:10.833792 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:11.050745 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:11.084466 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:10.847436 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:11.033790 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:19:21 CEST)" was missed by 0:00:10.978451 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:11.018127 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:11.168967 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:11.239726 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:11.038639 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:11.075892 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:10.945584 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:11.007374 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:11.193750 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:11.108793 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:11.207034 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:11.086570 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:11.145137 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:10.733300 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:10.956992 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:10.925409 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:11.238878 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:11.046831 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:11.181362 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:11.039962 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:10.989129 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:10.879321 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:10.885844 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:10.985578 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:11.196614 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:10.847342 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:10.926574 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:11.194284 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:10.918657 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:10.918492 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:11.175124 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:11.135523 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:11.143870 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:11.006542 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:10.893252 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:10.882904 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:11.110222 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:11.093259 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:10.906904 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:11.196193 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:20:21 CEST)" was missed by 0:00:11.037920 iteration 6630/ 159576 | consumed samples: 227200 | elapsed time per iteration (ms): 30197.7 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.797427E+00 | loss scale: 1024.0 | grad norm: 27869.081 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.793874 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.773402 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.924220 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.712251 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.936612 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.994985 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.795176 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.831165 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.700842 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.762623 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.948994 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.744387 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.864041 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.962298 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.841831 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.900425 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.641064 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.740811 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.488559 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.602582 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.994138 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.802133 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.890751 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.634588 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.949515 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.951904 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.673896 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.648474 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.638102 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.680699 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.930374 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.899138 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.681857 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.761808 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.673772 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.951401 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.865466 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.848501 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.662155 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:21:21 CEST)" was missed by 0:00:11.793164 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.651796 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.664224 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.722580 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.522807 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.521477 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.428432 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.501026 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.490239 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.676616 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.591629 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.689880 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.628023 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.368666 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.216162 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.330199 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.439892 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.721724 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.558806 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.471988 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.569451 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.362196 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.468430 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.677118 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.679535 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.401493 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.401336 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.529728 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.626734 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.408337 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.657997 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.618367 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.409473 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.489401 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.376104 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.365716 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.679035 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.593061 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.576101 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.389756 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:22:21 CEST)" was missed by 0:00:12.520771 iteration 6640/ 159576 | consumed samples: 228000 | elapsed time per iteration (ms): 30533.5 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.645786E+00 | loss scale: 1024.0 | grad norm: 45122.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6650/ 159576 | consumed samples: 228800 | elapsed time per iteration (ms): 30095.1 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.575365E+00 | loss scale: 1024.0 | grad norm: 33729.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6660/ 159576 | consumed samples: 229600 | elapsed time per iteration (ms): 29459.8 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.526109E+00 | loss scale: 1024.0 | grad norm: 53212.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.756661 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.778403 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.684024 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.695536 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.978236 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.745883 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.882309 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.932319 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.883686 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.624336 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.471845 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.777210 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.814435 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.727692 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.724129 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.935170 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.663967 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.919941 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.617876 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.665089 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.932805 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.657159 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.977459 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.785410 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.847362 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.945629 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.825157 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.585913 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.657015 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.848676 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.913675 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.831709 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.874081 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.907608 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.631798 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.621416 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.645369 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.776357 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.745123 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:36:21 CEST)" was missed by 0:00:11.934725 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.462053 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.534699 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.756253 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.556466 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.402375 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.555225 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.523927 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.713187 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.473598 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.592472 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.710392 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.661738 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.395881 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.443121 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.710821 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.249875 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.435170 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.363906 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.441992 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.755465 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.697958 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.723668 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.660420 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.603193 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.502202 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.685585 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.563443 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.505762 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.625405 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.409811 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.399415 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.435056 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.691701 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.652102 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.626774 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.609800 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.712727 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.523142 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.423445 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:37:21 CEST)" was missed by 0:00:12.554482 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:40:21 CEST)" was missed by 0:00:11.943541 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:40:21 CEST)" was missed by 0:00:11.955027 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:40:21 CEST)" was missed by 0:00:12.037922 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:40:21 CEST)" was missed by 0:00:12.073934 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:40:21 CEST)" was missed by 0:00:12.005379 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:40:21 CEST)" was missed by 0:00:11.883841 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:40:21 CEST)" was missed by 0:00:11.731317 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:40:21 CEST)" was missed by 0:00:11.987179 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:40:21 CEST)" was missed by 0:00:11.877345 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:40:21 CEST)" was missed by 0:00:11.923493 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:40:21 CEST)" was missed by 0:00:11.983629 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:40:21 CEST)" was missed by 0:00:11.916663 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:40:21 CEST)" was missed by 0:00:11.845402 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:40:21 CEST)" was missed by 0:00:12.044881 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:40:21 CEST)" was missed by 0:00:11.916497 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:40:21 CEST)" was missed by 0:00:11.891263 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:40:21 CEST)" was missed by 0:00:12.091253 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:40:21 CEST)" was missed by 0:00:12.133554 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:40:21 CEST)" was missed by 0:00:12.004573 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:40:21 CEST)" was missed by 0:00:11.880895 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:40:21 CEST)" was missed by 0:00:11.924625 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:40:21 CEST)" was missed by 0:00:11.904904 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:40:21 CEST)" was missed by 0:00:12.035926 iteration 6670/ 159576 | consumed samples: 230400 | elapsed time per iteration (ms): 29914.3 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.445564E+00 | loss scale: 1024.0 | grad norm: 25396.175 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.498024 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.431809 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.438349 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.509572 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.592445 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.628425 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.559882 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.479079 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.285841 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.399866 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.538151 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.541719 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.471168 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.471020 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.478001 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.645698 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.599434 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.445779 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.435392 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.459333 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.591286 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.570807 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.688064 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.696424 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.559096 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.792344 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.590362 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.746889 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.734033 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.759704 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.697836 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.727756 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.746496 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.791548 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.749331 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.661477 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.639302 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.662791 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.721674 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:41:21 CEST)" was missed by 0:00:11.748805 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.149366 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.169903 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.207118 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.076778 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.325028 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.276400 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.017056 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.088247 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.178065 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.312616 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.370997 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.138612 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.306344 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.120397 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.217852 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.010569 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.116853 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.300242 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.325479 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:10.864564 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:10.978596 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.056673 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.370157 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.171201 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.240071 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.338323 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.275097 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.057844 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.327905 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.049891 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.049731 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.266791 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.137800 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.024509 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.014133 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.327400 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.241456 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.224478 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.038134 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:42:21 CEST)" was missed by 0:00:11.169130 [2021-09-30 07:42:10] PULSE: tr8-104B is running for 3:50:03 since 2021-09-30T03:52:07 (1289770 on 'gpu_p13' partition (r6i4n[5-6,8],r6i5n[4-5],r7i0n[5-8],r7i1n0,r8i2n8,r8i4n1,r8i7n[3-8],r9i0n[0-8],r9i1n[0-8],r9i2n[3-8],r9i3n[7-8],r9i4n[0-2],r9i5n[2,5-7],r9i6n[2-8],r14i7n[1-6]) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.195009 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.122403 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.215560 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.184257 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.322031 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.056169 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.133920 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.216817 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.252791 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.370719 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.263502 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.062722 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.371127 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:10.910201 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.415794 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.358300 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.416623 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.351984 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.320731 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.162505 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.345905 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.024258 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.095368 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.102330 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.223765 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.166077 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.285729 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.383978 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.103476 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.373563 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.095535 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.070139 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.270103 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.059771 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.287087 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.183471 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.373040 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.312458 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.083752 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:43:21 CEST)" was missed by 0:00:11.214792 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.429549 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.450079 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.356957 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.368467 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.651154 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.487332 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.418807 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.605246 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.556576 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.290762 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.297262 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.650314 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.592825 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.451372 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.586522 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.555253 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.498040 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.580442 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.258779 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.336873 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.520260 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.618519 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.337999 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.397068 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.605720 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.608108 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.330087 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.329920 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.458332 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.400633 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.144819 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.418007 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.304704 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.546984 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.521679 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.504720 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.294376 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.607647 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.318358 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:44:21 CEST)" was missed by 0:00:11.449385 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.309161 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.329684 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.236557 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.472399 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.530757 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.366927 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.434845 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.176861 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.485254 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.138359 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.248084 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.216468 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.529915 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.330960 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.298420 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.484856 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.498127 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.377656 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.436222 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.170384 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.276646 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.460021 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.487677 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.024364 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.209663 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.401200 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.466154 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.280235 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.399867 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.184276 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.209525 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.487174 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.217647 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.173924 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.337923 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.384256 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.426586 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.297605 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.197881 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:45:21 CEST)" was missed by 0:00:10.328901 iteration 6680/ 159576 | consumed samples: 231200 | elapsed time per iteration (ms): 29985.8 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.389253E+00 | loss scale: 1024.0 | grad norm: 67101.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.249301 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.424953 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.188190 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.307056 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.317762 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.376325 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.110483 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.216754 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.156589 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.269875 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.176784 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.406269 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.220324 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.117030 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.425402 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.427801 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:10.964480 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.078542 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.470110 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.278027 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.400221 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.124395 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.149647 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.412621 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.470999 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.238632 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.438313 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.375081 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.157769 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.114042 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.271198 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.340044 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.427325 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.149897 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.341426 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.237768 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.324450 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.366773 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.138119 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:46:21 CEST)" was missed by 0:00:11.269135 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.091916 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.112428 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.019325 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.149672 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.270410 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.030841 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.255179 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.313531 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.081182 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.267628 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.217613 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.160412 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:09.953142 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.000356 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:09.959621 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.242803 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:09.921138 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:09.999219 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.312696 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.120646 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.113772 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.248920 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.062982 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.182630 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.280891 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.219007 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.059429 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.268059 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:09.807168 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:09.992448 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.183976 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:09.967054 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:09.992303 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:09.956711 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.209345 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.080372 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.269978 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:09.980656 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.111688 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:47:21 CEST)" was missed by 0:00:10.167057 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.800920 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.728357 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.739869 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.708228 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.821507 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.858698 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.790215 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.957916 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.976643 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.869433 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.927983 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.662153 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.668651 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.977075 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.979458 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.630186 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:10.021734 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.829648 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.964259 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:10.022562 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.822790 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.772012 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.891646 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.989898 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.926684 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.768449 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.951857 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.516166 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.701487 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.701305 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.893020 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.709415 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.676093 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.918352 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.789392 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.665734 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.979020 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.876067 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.689717 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:48:21 CEST)" was missed by 0:00:09.820738 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.676521 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.749131 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.687981 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.769664 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.738359 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.924774 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.817557 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.610290 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.616774 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.716548 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.925198 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.969896 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.912383 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.970716 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.770932 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.806898 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.906087 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.720111 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.839788 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.938040 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.874840 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.876154 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.927633 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.464309 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.649621 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.624194 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.578341 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.656415 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.927118 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.777855 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.866515 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.657563 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.737534 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.900027 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.613843 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.649482 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.841200 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.637853 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.824238 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:49:21 CEST)" was missed by 0:00:10.768891 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.658733 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.731358 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.952926 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.751869 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.858377 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.599015 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.670240 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.952105 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.760067 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.894618 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.753127 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.789119 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.720593 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.888302 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.907031 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.857023 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.799845 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.592539 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.698828 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.907444 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.909856 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.446550 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.560559 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.631675 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.638651 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.823401 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.806417 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.702403 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.822042 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.848747 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.920298 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.639780 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.882244 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.631856 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.606439 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.909358 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.719777 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.596119 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.620066 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:50:21 CEST)" was missed by 0:00:09.751092 iteration 6690/ 159576 | consumed samples: 232000 | elapsed time per iteration (ms): 29807.2 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.385251E+00 | loss scale: 1024.0 | grad norm: 32704.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.229039 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.249565 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.156440 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.286798 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.090202 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.096717 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.167960 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.449795 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.392287 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.450654 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.218319 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.404728 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.417980 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.297513 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.356076 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.379907 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.405141 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:08.944218 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.058251 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.136356 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.257789 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.250867 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.386001 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.200096 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.319718 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.354757 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.196520 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.407554 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.129556 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.104145 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.129397 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.137512 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.093767 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.407061 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.321117 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.304145 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.346455 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.217493 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.117796 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:51:21 CEST)" was missed by 0:00:09.248840 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.330174 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.257623 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.269101 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.551815 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.350746 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.457237 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.197887 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.387976 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.505880 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.455888 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.398673 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.237503 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.550970 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.493480 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.352026 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.319478 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.301230 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.519145 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.191406 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.297678 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.506319 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.508737 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.045408 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.159442 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.230557 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.358946 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.422218 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.487183 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.405262 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.420909 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.238660 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.481101 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.230735 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.205302 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.447626 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.318626 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.194976 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.218902 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.508248 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:52:21 CEST)" was missed by 0:00:10.349935 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:10.903697 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:10.924212 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:10.831128 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:10.842570 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:11.079362 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:10.972157 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:11.030719 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:10.764878 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:10.771386 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:11.124462 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:11.066960 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:11.125335 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:10.961468 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:10.892992 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:10.874718 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:11.029412 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:10.871152 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:11.079811 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:11.082224 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:10.618882 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:10.732905 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:10.810994 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:10.932456 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:10.925555 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:10.995733 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:11.060662 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:10.994394 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:11.092663 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:10.812145 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:11.054583 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:10.778781 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:10.768415 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:10.804043 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:11.081704 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:10.978767 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:10.804237 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:11.021126 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:10.892140 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:10.792435 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:53:21 CEST)" was missed by 0:00:10.923456 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:12.004803 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:11.932263 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:12.226455 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:11.943733 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:12.025385 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:12.131875 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:12.062609 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:11.994115 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:12.073319 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:11.866032 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:11.872541 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:11.972323 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:12.180959 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:11.834055 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:12.225622 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:12.168133 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:12.026671 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:12.161824 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:11.975874 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:12.193801 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:12.130593 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:11.913292 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:12.155734 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:12.183379 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:11.720045 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:12.033627 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:12.180532 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:12.095547 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:11.905375 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:11.879953 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:11.869585 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:11.905210 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:11.912217 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:12.182871 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:12.096931 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:12.122259 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:11.993304 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:12.079975 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:11.893623 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:54:21 CEST)" was missed by 0:00:12.024644 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.209393 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.148263 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.229928 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.136845 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.336413 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.277859 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.070584 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.077113 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.267180 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.176871 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.430182 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.431068 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.198700 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.366377 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.385077 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.180416 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.385536 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.387951 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:11.924606 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.038644 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.116721 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.238143 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.372696 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.231258 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.300127 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.398382 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.117869 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.360315 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.109956 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.084507 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.109779 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.387426 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.335218 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.074163 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.326845 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.197876 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.301549 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.284595 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.098241 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:55:21 CEST)" was missed by 0:00:12.229241 iteration 6700/ 159576 | consumed samples: 232800 | elapsed time per iteration (ms): 30231.3 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.368582E+00 | loss scale: 1024.0 | grad norm: 36497.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:11.894730 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:11.967348 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:12.188924 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:12.094345 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:11.906213 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:11.987870 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:12.025083 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:11.956563 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:12.142969 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:12.035800 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:11.835025 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:12.145838 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:12.188078 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:11.996045 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:11.989155 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:12.124289 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:11.938353 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:12.058019 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:12.156262 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:12.093036 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:11.828537 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:11.934825 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:12.118218 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:12.143471 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:11.682548 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:11.867823 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:11.796577 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:11.867671 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:11.874643 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:12.130637 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:12.059384 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:12.042405 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:12.084738 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:11.875798 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:11.955763 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:11.842429 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:12.145363 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:11.832103 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:11.856057 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:56:21 CEST)" was missed by 0:00:11.987081 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:57:21 CEST)" was missed by 0:00:12.678237 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:57:21 CEST)" was missed by 0:00:12.901673 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:57:21 CEST)" was missed by 0:00:12.689721 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:57:21 CEST)" was missed by 0:00:12.772650 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:57:21 CEST)" was missed by 0:00:12.808612 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:57:21 CEST)" was missed by 0:00:12.721836 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:57:21 CEST)" was missed by 0:00:12.612016 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:57:21 CEST)" was missed by 0:00:12.618522 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:57:21 CEST)" was missed by 0:00:12.718312 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:57:21 CEST)" was missed by 0:00:12.466007 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:57:21 CEST)" was missed by 0:00:12.651332 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:57:21 CEST)" was missed by 0:00:12.580028 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:57:21 CEST)" was missed by 0:00:12.651175 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:57:21 CEST)" was missed by 0:00:12.779585 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:57:21 CEST)" was missed by 0:00:12.740139 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:57:21 CEST)" was missed by 0:00:12.825871 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:57:21 CEST)" was missed by 0:00:12.841514 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:57:21 CEST)" was missed by 0:00:12.868180 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:57:21 CEST)" was missed by 0:00:12.659258 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:57:21 CEST)" was missed by 0:00:12.615561 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:57:21 CEST)" was missed by 0:00:12.639494 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:57:21 CEST)" was missed by 0:00:12.658173 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:57:21 CEST)" was missed by 0:00:12.739266 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:57:21 CEST)" was missed by 0:00:12.625941 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:57:21 CEST)" was missed by 0:00:12.770524 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.115652 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.017422 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.127122 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.246018 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.159255 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.278903 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.049422 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.055907 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.155716 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:10.903438 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.210075 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.188350 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.096687 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.339071 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.088589 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.095574 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.177562 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.063359 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.217029 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.305633 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.176686 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.088828 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.208855 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.053003 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.315397 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.409081 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.351564 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.409957 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.345318 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.263376 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.364008 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.077014 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.377281 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.314084 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.364492 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.366918 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.208029 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.256853 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.280458 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:58:21 CEST)" was missed by 0:00:11.366398 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.532525 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.605152 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.730795 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.732155 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.466329 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.544036 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.472855 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.781251 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.825910 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.768400 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.826764 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.626946 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.625696 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.662926 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.762095 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.780821 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.576162 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.673623 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.572632 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.755985 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.783689 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.320355 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.434381 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.505477 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.512461 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.594429 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.697184 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.680185 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.695838 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.794094 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.513569 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.505661 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.480229 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.469889 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.783177 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.633927 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.722556 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.593608 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.493836 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 07:59:21 CEST)" was missed by 0:00:09.624853 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:08.942377 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:08.869814 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:08.881281 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:08.962909 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:09.118036 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:09.069407 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:09.163141 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:09.000163 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:08.810088 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:09.118503 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:08.771602 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:09.105645 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:09.164017 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:09.068105 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:08.803575 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:09.093248 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:09.120919 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:08.964218 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:09.099365 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:08.913434 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:09.033080 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:09.131338 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:09.010909 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:08.850848 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:08.909911 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:08.657598 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:08.842929 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:08.817498 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:08.842760 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:08.849726 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:08.971147 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:08.931727 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:09.017469 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:09.034470 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:09.059808 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:08.930865 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:09.120433 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:08.807174 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:08.831154 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:00:21 CEST)" was missed by 0:00:08.962147 iteration 6710/ 159576 | consumed samples: 233600 | elapsed time per iteration (ms): 29774.3 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.359943E+00 | loss scale: 1024.0 | grad norm: 39467.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:10.017020 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:09.944447 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:10.192651 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:09.878149 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:10.193100 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:09.955879 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:10.037555 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:10.173930 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:09.987980 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:10.085463 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:10.144021 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:09.984456 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:10.074784 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:09.925403 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:10.195528 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:09.732206 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:09.917326 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:10.237780 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:10.180300 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:10.238651 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:10.038869 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:09.884769 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:10.167876 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:09.892094 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:09.846269 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:09.924336 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:10.006351 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:10.107750 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:10.206007 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:10.142806 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:09.881732 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:10.195027 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:09.917582 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:10.045808 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:10.134442 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:10.005491 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:10.109195 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:10.092210 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:09.905900 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:01:21 CEST)" was missed by 0:00:10.036886 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.342814 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.270249 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.518491 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.210513 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.281748 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.400607 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.469861 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.204018 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.518960 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.563595 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.363401 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.499794 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.468548 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.521366 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.506129 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.564457 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.364669 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.313873 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.433512 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.531784 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.411346 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.251259 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.310333 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.493741 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.058045 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.172103 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.243200 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.250184 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.332155 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.434902 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.460231 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.331271 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.243387 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.217940 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.207597 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.371647 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.520879 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.417948 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.231598 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:02:21 CEST)" was missed by 0:00:10.362589 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.051373 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.123956 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.344683 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.287182 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.144453 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.299603 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.250944 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:08.985093 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.274794 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.300026 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:08.839123 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:08.953137 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.062848 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.345580 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.181735 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.280900 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.094968 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.249653 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.192438 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.032368 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:08.991689 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.091406 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.302480 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:08.998993 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.024297 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.301928 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.145792 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.113285 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.312932 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.024501 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:08.988656 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.031305 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.216018 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.214692 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.199043 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.152771 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.241412 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.112443 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.012716 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:03:21 CEST)" was missed by 0:00:09.143713 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:08.831909 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:08.904539 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:09.031541 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:08.765699 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:08.772216 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:09.126116 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:08.962291 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:09.080204 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:09.030209 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:08.843455 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:08.926341 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:08.925090 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:09.080647 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:09.083064 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:08.619739 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:08.811859 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:09.125301 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:09.067815 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:08.893815 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:08.996563 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:09.061504 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:08.875595 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:08.995227 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:09.093474 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:08.973050 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:08.812988 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:08.872037 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:09.055435 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:08.805026 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:08.779634 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:08.733788 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:08.804910 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:08.979581 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:09.021935 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:08.892965 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:08.769286 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:09.082561 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:08.933351 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:08.924238 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:04:21 CEST)" was missed by 0:00:08.793253 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.256390 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.183823 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.383393 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.117554 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.124058 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.478000 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.276933 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.314143 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.432483 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.434898 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.195311 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.477162 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.419637 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.278204 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.413330 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.432083 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.445337 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.164838 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.407267 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:08.971582 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.085616 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.156738 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.163701 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.245709 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.227451 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.347088 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.382146 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.324907 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.223896 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.156921 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.131481 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.434391 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.285177 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.373810 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.121159 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.244838 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.348505 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.331541 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.145196 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:05:21 CEST)" was missed by 0:00:09.276184 iteration 6720/ 159576 | consumed samples: 234400 | elapsed time per iteration (ms): 29941.4 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.368979E+00 | loss scale: 1024.0 | grad norm: 43688.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.766853 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.839463 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:09.015108 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.778358 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:09.061071 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.965134 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.707150 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.860031 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.897249 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.700679 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:09.015577 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:09.002745 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.861270 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.931483 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.810500 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.930152 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:09.028427 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.907954 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.966522 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.806948 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.990372 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:09.018003 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.554702 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.668714 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.746804 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:09.060264 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.828774 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.996486 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.914526 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.739978 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.714593 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.739895 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.747967 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.704234 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.868293 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.956896 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.728190 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:09.017506 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.859172 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:06:21 CEST)" was missed by 0:00:08.827925 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.122635 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.050078 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.344272 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.180417 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.248347 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.249703 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:07.990378 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.298791 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.061613 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.029981 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.144483 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.143258 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.298380 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.311608 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:07.983881 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.273557 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.301189 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:07.837876 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.343478 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.285968 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.112008 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.214703 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.279662 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.093747 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.213387 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.240063 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.191206 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.090178 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.023198 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:07.951928 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.023079 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.300704 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.151468 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.197736 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.031172 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:07.997792 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:07.987430 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.111128 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.011408 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:07:21 CEST)" was missed by 0:00:08.142390 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.364303 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.585879 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.291718 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.489952 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.491293 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.225470 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.540394 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.542813 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.303218 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.422072 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.232017 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.384885 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.540002 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.553232 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.271634 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.585088 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.527587 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.386122 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.353607 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.456307 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.521268 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.439318 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.335354 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.455007 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.432821 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.272748 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.331805 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.079513 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.264813 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.239398 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.193562 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.264676 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.542311 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.383987 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.481713 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.515224 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.229059 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.252999 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.393125 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:08:21 CEST)" was missed by 0:00:07.352750 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:06.874956 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:06.802393 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:06.736133 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:07.096543 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:07.001980 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:06.742656 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:07.051052 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:06.932738 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:07.053464 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:06.590161 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:06.813919 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:06.782272 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:06.896772 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:06.895543 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:07.050685 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:07.063891 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:07.000695 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:06.750027 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:07.095782 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:07.038274 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:06.864276 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:07.031949 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:06.846034 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:06.965672 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:06.943516 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:06.842479 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:07.025885 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:06.775500 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:06.704236 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:06.775342 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:07.052955 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:06.903748 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:06.967048 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:06.992370 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:06.783443 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:06.739693 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:06.950076 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:06.863406 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:06.763743 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:09:21 CEST)" was missed by 0:00:06.894739 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.532991 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.460397 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.658638 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.754574 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.471915 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.553557 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.660017 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.394180 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.709100 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.711500 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.696267 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.554797 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.590768 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.689970 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.708698 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.721937 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.400713 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.248193 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.753783 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.522297 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.624997 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.504078 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.601524 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.441455 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.683901 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.433509 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.362264 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.433378 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.440332 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.711005 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.608036 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.623725 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.500512 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.408106 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.397757 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.421689 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.561801 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.552687 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.650412 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:10:21 CEST)" was missed by 0:00:05.521458 iteration 6730/ 159576 | consumed samples: 235200 | elapsed time per iteration (ms): 29711.1 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.346693E+00 | loss scale: 1024.0 | grad norm: 42854.132 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.097493 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.118028 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.024911 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.223165 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:05.958672 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.273584 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.036392 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.254435 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.286408 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.224509 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:05.965183 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:05.812682 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.318267 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.260752 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.319133 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.155277 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.273182 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.068538 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.166004 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.005919 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.064991 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.248367 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.276031 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:05.972556 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:05.926731 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:05.997847 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.004812 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.119357 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.086848 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.189519 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.172559 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.188201 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:05.998026 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:05.962238 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.275500 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.214914 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.085934 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:05.986203 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.126319 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:11:21 CEST)" was missed by 0:00:06.117224 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.450677 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.378088 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.672283 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.626771 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.639574 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.576379 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.311882 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.389606 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.613952 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.472488 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.471238 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.508465 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.626389 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.577719 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.318382 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.629225 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.165882 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.358009 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.671457 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.439984 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.542721 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.607668 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.421747 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.541398 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.519205 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.359137 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.418191 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.601598 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.351200 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.325759 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.279939 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.315419 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.351065 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.628683 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.479493 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.525749 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.568107 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.439139 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.470407 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:12:21 CEST)" was missed by 0:00:07.339428 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.574856 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.502280 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.796468 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.700550 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.701884 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.436052 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.753360 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.596665 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.595426 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.632622 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.442577 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.513812 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.482150 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.751013 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.290077 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.475210 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.795652 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.738158 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.564172 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.666896 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.731863 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.750591 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.545940 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.665575 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.763855 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.643406 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.483311 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.542390 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.475376 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.449977 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.404133 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.603648 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.649927 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.725833 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.439632 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.463574 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.752910 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.692301 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.563328 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:13:21 CEST)" was missed by 0:00:08.594606 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.769942 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.697407 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.896976 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.946044 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.631151 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.637657 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.708901 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.790528 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.827740 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.895675 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.948474 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.485150 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.990708 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.933235 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.991591 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.945673 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.838475 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.737467 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.599220 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.670312 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.677289 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.947954 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.791817 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.759307 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.862026 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.926950 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.741040 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.860683 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.958933 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.678421 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.920885 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.670500 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.645060 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.634694 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.798777 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.845064 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.887385 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.758425 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.789704 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:14:21 CEST)" was missed by 0:00:07.658727 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:08.910448 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:08.837843 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:08.771634 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:09.132056 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:08.968206 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:09.037468 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:09.086555 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:08.849375 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:08.817748 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:08.931008 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:09.086149 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:08.778148 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:09.088970 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:09.131235 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:09.073730 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:08.899746 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:09.067430 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:08.881501 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:09.001140 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:09.099375 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:09.036187 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:08.978959 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:08.818909 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:08.877962 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:08.625672 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:08.810954 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:08.739691 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:08.810809 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:09.088447 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:08.939239 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:08.932294 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:09.027842 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:09.061397 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:08.785551 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:08.775193 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:09.002534 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:08.898891 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:08.985561 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:08.799219 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:15:21 CEST)" was missed by 0:00:08.930210 iteration 6740/ 159576 | consumed samples: 236000 | elapsed time per iteration (ms): 30348.0 | learning rate: 6.000E-05 | global batch size: 80 | lm loss: 6.353148E+00 | loss scale: 1024.0 | grad norm: 36346.887 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.409056 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.336491 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.587549 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.630687 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.536094 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.348011 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.316360 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.429639 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.466839 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.584767 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.270283 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.276785 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.629829 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.572348 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.430898 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.398380 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.380132 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.499767 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.477598 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.376568 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.585205 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.124316 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.309587 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.238325 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.309428 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.566087 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.598048 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.534857 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.317558 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.587082 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.437850 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.397525 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.560029 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.284207 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.501185 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.526499 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.273859 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.484206 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.297863 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:16:21 CEST)" was missed by 0:00:09.428866 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.694309 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.819995 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.821317 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.555499 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.561992 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.633217 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.714868 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.752077 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.621781 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.870404 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.872823 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.601620 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.915075 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.857565 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.915960 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.716099 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.869994 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.665342 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.784997 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.883215 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.762802 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.661793 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.409517 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.523527 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.594646 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.872282 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.683658 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.786361 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.851310 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.769353 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.811666 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.602783 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.845248 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.594850 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.569410 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.559055 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.723119 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.682741 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.583013 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:17:21 CEST)" was missed by 0:00:10.714015 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.196646 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.124090 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.064334 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.135555 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.254411 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.323688 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.372767 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.375160 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.417411 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.418295 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.217219 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.372351 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.385561 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.322398 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.057877 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.164144 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:12.911862 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.097154 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.025881 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.103961 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.359924 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.218518 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.185981 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.353654 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.167714 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.287330 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.265159 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.347605 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.097023 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.374650 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.225442 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.288709 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.314059 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.105138 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.071789 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.061409 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.271762 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.185092 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.085401 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:18:21 CEST)" was missed by 0:00:13.216399 iteration 6750/ 159576 | consumed samples: 236912 | elapsed time per iteration (ms): 31367.5 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.339949E+00 | loss scale: 1024.0 | grad norm: 36682.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.748677 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.948928 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.891398 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.949800 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.749934 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.785942 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.728208 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.903826 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.699168 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.818856 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.917131 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.853893 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.796663 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.855203 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.589359 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.695644 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.904261 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.443356 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.628664 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.557400 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.667084 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.635504 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.906177 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.655636 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.717507 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.885186 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.845554 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.636668 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.595872 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.716598 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.879070 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.906736 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.603267 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.628525 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.756977 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.820261 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.747875 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.803303 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.592925 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:25:21 CEST)" was missed by 0:00:05.616987 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.471876 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.414382 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.472768 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.271673 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.308914 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.251168 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.426786 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.222167 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.376796 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.319620 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.378167 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.112318 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.402025 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.427252 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:08.966342 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.151629 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.080341 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.190062 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.158449 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.272993 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.178604 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.240477 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.343156 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.408139 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.270797 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.326186 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.341851 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.440135 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.159617 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.118895 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.218625 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.429678 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.126227 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.151460 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.429174 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.279948 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.368572 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.239600 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.115927 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:26:21 CEST)" was missed by 0:00:09.139879 iteration 6760/ 159576 | consumed samples: 237872 | elapsed time per iteration (ms): 31713.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.366327E+00 | loss scale: 1024.0 | grad norm: 26158.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:27:21 CEST)" was missed by 0:00:13.715982 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:27:21 CEST)" was missed by 0:00:13.373429 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:27:21 CEST)" was missed by 0:00:13.558721 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:27:21 CEST)" was missed by 0:00:13.565524 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:27:21 CEST)" was missed by 0:00:13.680025 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:27:21 CEST)" was missed by 0:00:13.585709 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:27:21 CEST)" was missed by 0:00:13.629295 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:27:21 CEST)" was missed by 0:00:13.775614 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:27:21 CEST)" was missed by 0:00:13.519449 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:27:21 CEST)" was missed by 0:00:13.566712 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:27:21 CEST)" was missed by 0:00:13.625772 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:27:21 CEST)" was missed by 0:00:13.533330 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:27:21 CEST)" was missed by 0:00:13.487462 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:27:21 CEST)" was missed by 0:00:13.558588 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:27:21 CEST)" was missed by 0:00:13.687000 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:27:21 CEST)" was missed by 0:00:13.647586 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:27:21 CEST)" was missed by 0:00:13.677948 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:27:21 CEST)" was missed by 0:00:13.525986 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:27:21 CEST)" was missed by 0:00:13.522979 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:27:21 CEST)" was missed by 0:00:13.597215 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:27:21 CEST)" was missed by 0:00:13.646683 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:27:21 CEST)" was missed by 0:00:13.733373 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:27:21 CEST)" was missed by 0:00:13.547050 iteration 6770/ 159576 | consumed samples: 238832 | elapsed time per iteration (ms): 31633.6 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.351589E+00 | loss scale: 1024.0 | grad norm: 32550.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.867814 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.786252 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:05.023002 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.973050 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:05.010600 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.818379 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.915833 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.676576 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:05.068116 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.708575 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.814846 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:05.023505 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.974441 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.562597 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.747911 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.747735 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.905214 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.847500 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:05.004387 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.938126 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:05.036369 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.755859 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.998275 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.754773 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:05.069085 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.869282 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.774902 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.836742 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.939431 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.922459 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:05.025990 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.722504 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:05.025417 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.715161 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.964849 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.712182 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.736151 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.867089 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.835870 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:36:21 CEST)" was missed by 0:00:04.876261 iteration 6780/ 159576 | consumed samples: 239792 | elapsed time per iteration (ms): 31040.4 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.392241E+00 | loss scale: 1024.0 | grad norm: 34799.773 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.685656 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.828416 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.604105 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.840850 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.790894 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.841307 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.380382 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.494405 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.885941 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.665292 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.636228 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.792265 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.816079 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.565719 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.540272 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.565561 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.886894 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.723037 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.592705 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.757243 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.755958 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.733726 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.526425 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.573667 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.632724 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.572574 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.843210 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.687097 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.654559 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.822213 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.854217 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.843789 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.529949 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.740295 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.553966 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.684915 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.533032 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.653683 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.694079 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:37:21 CEST)" was missed by 0:00:05.782695 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.626797 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.781927 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.545200 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.506649 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.577319 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.674789 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.782423 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.827085 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.769578 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.827969 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.664129 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.533796 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.606400 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.732045 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.733372 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.573799 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.321530 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.506845 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.481398 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.435531 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.628187 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.697073 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.795315 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.467533 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.514805 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.757228 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.784900 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.513681 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.784332 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.595657 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.763341 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.471078 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.474130 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.635157 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.698459 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.723802 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.594804 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.681518 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.626139 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:38:21 CEST)" was missed by 0:00:07.495173 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.137084 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.279868 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.174389 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.044065 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.116650 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.292270 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.207290 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.305532 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.185105 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.243641 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:08.977795 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.292721 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:08.831772 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.017093 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:08.991656 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:08.945823 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.055541 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.023919 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.337378 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.338237 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.138468 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.105919 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.087656 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.242351 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.025055 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:08.984337 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.084121 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.267529 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.295167 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:08.981325 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.016970 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.294607 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.208678 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.273628 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.234031 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.105027 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.145432 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.005397 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.191741 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:39:21 CEST)" was missed by 0:00:09.136354 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.766147 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.921297 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.836319 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.574843 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.645988 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.967274 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.716675 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.871333 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.921779 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.646131 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.684573 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.966404 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.908920 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.767510 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.673116 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.745725 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.934587 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.814155 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.606880 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.713144 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.460854 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.620740 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.803497 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.734956 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.837692 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.863049 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.872731 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.654136 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.613376 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.896580 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.924243 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.610402 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.653027 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.923666 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.734053 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.902697 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.820749 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.634410 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.765336 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:40:21 CEST)" was missed by 0:00:10.774503 iteration 6790/ 159576 | consumed samples: 240752 | elapsed time per iteration (ms): 30867.9 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.349704E+00 | loss scale: 1024.0 | grad norm: 35833.809 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) [2021-09-30 08:43:20] PULSE: tr8-104B is running for 4:51:13 since 2021-09-30T03:52:07 (1289770 on 'gpu_p13' partition (r6i4n[5-6,8],r6i5n[4-5],r7i0n[5-8],r7i1n0,r8i2n8,r8i4n1,r8i7n[3-8],r9i0n[0-8],r9i1n[0-8],r9i2n[3-8],r9i3n[7-8],r9i4n[0-2],r9i5n[2,5-7],r9i6n[2-8],r14i7n[1-6]) iteration 6800/ 159576 | consumed samples: 241712 | elapsed time per iteration (ms): 30873.9 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.367813E+00 | loss scale: 1024.0 | grad norm: 41424.861 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6810/ 159576 | consumed samples: 242672 | elapsed time per iteration (ms): 30682.6 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.382545E+00 | loss scale: 1024.0 | grad norm: 34278.039 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.136121 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.054487 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.291245 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.086629 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.024089 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.291760 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.278910 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.104955 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.241381 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.184134 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.242739 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.083104 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.336434 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.293641 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.337342 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.115762 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.023022 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.137523 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.206411 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.294254 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.016209 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.043181 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.272722 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.304687 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.266615 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.016063 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.233115 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.173520 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.207756 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.190790 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.144514 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.104146 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.135412 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:57:21 CEST)" was missed by 0:00:03.004478 iteration 6820/ 159576 | consumed samples: 243632 | elapsed time per iteration (ms): 30306.3 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.347304E+00 | loss scale: 1024.0 | grad norm: 43929.548 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.709859 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.628205 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.864953 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.660350 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.852589 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.757818 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.597788 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.865462 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.910120 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.816437 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.689475 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.815111 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.656851 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.840267 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.867357 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.678714 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.867961 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.711244 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.846404 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.589932 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.589761 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.596769 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.718211 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.911086 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.616910 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.780152 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.878400 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.747243 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.764516 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.806857 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.781506 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.677870 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.518696 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.404690 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.709152 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.564591 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.578193 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.550774 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.554267 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:58:21 CEST)" was missed by 0:00:03.557302 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.861499 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:06.016626 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.779892 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:06.004250 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.812023 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.909475 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.968053 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:06.017105 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.841086 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.830315 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.966732 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.702227 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.749455 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.556210 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.716079 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.670239 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:06.061802 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.808514 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.991930 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.705725 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.741363 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.748373 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:06.018990 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:06.062714 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.768537 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.998039 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.931771 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:06.030019 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:06.019586 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.741574 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.862914 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.898864 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.933092 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.916120 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.958466 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.869846 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.708813 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.829503 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.860742 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 08:59:21 CEST)" was missed by 0:00:05.729799 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.885090 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.803490 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:08.040225 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:08.027880 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.864690 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.835635 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.933103 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.773057 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.832089 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:08.040745 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.693812 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:08.085367 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.991699 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:08.015522 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.579857 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.771991 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.990398 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.725895 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:08.043201 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.739728 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:08.086338 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.729376 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.765027 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.792166 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.854002 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:08.021657 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.765209 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:08.042666 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.893455 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.886550 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.922484 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.955422 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.982094 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:08.053673 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.732445 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.956786 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.939813 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.853132 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.884442 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:00:21 CEST)" was missed by 0:00:07.753511 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:10.001971 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:10.157120 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:09.920383 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:10.049953 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:10.157588 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:10.202250 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:10.144764 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:09.981569 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:09.952532 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:09.696716 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:10.108574 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:09.842742 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:09.889958 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:09.948996 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:10.132406 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:09.856576 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:09.810741 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:10.159496 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:10.203204 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:10.003392 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:10.039328 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:09.909022 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:09.970858 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:10.072279 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:10.160077 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:09.882068 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:09.888870 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:10.138558 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:10.170536 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:10.107296 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:09.846270 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:09.881924 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:10.098959 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:10.010343 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:10.073654 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:09.849309 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:09.969996 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:10.056704 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:10.001309 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:01:21 CEST)" was missed by 0:00:09.870367 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.858015 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.776412 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.905979 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.808529 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:13.000785 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:13.013660 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.837619 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.745997 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.805005 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.988425 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.552766 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.712617 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.666775 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.826874 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.964620 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.698798 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.737945 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.744920 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.765078 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:13.016123 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.738091 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.702293 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.859430 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.895391 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.994583 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.928322 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.705331 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.866391 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.955019 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.826040 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.912734 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.929740 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.857344 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:02:21 CEST)" was missed by 0:00:12.726402 iteration 6830/ 159576 | consumed samples: 244592 | elapsed time per iteration (ms): 31010.5 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.375375E+00 | loss scale: 1024.0 | grad norm: 40351.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6840/ 159576 | consumed samples: 245552 | elapsed time per iteration (ms): 30954.4 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.360943E+00 | loss scale: 1024.0 | grad norm: 42077.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6850/ 159576 | consumed samples: 246512 | elapsed time per iteration (ms): 30379.2 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.356625E+00 | loss scale: 1024.0 | grad norm: 36705.788 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) iteration 6860/ 159576 | consumed samples: 247472 | elapsed time per iteration (ms): 30489.4 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.331403E+00 | loss scale: 1024.0 | grad norm: 28294.129 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:04.180514 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:03.956184 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:04.037832 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:04.192940 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:03.732498 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:04.239002 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:03.988327 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:03.984813 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:04.193443 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:04.017432 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:04.085809 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:04.144430 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:03.892399 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:03.846588 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:03.924710 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:04.238106 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:04.075164 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:03.925840 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:04.195936 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:03.917901 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:04.039216 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:04.006698 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:04.168283 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:04.195360 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:04.108138 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:04.134772 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:04.143140 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:03.878625 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:03.882088 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:03.917738 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:04.046183 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:03.944903 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:04.174428 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:04.092473 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:04.206374 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:03.885166 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:04.109485 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:04.005855 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:04.037097 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:19:21 CEST)" was missed by 0:00:03.906189 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.735114 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.592426 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.697608 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.748014 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.287106 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.792660 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.572006 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.663926 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.646945 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.760887 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.446988 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.401161 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.479259 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.749917 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.629760 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.699024 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.433183 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.480393 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.722858 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.472290 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.662681 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.689295 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.750500 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.728975 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.591610 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.436697 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.600750 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.439710 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.560384 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.460666 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.793858 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.472806 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.511281 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.561603 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.748005 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.594153 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.499817 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.543406 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.640862 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:20:21 CEST)" was missed by 0:00:04.539903 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:06.059103 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:06.013160 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.776424 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:06.000790 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.858068 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.906004 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.552701 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.712574 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.859350 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.895321 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.837597 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.808557 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:06.013651 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.738031 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.744862 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.765011 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.826841 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.928264 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.954876 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:06.026509 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.963270 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.805032 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:06.058350 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:06.015529 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.912603 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.964665 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.698797 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.746028 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:06.016118 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.666825 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.866336 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.929625 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.705280 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.988528 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.702283 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.737944 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.994608 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.857245 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.826015 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:21:21 CEST)" was missed by 0:00:05.726316 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.428820 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.227770 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.370473 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.382869 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.146149 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.207309 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.282264 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.178263 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.332935 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.275737 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.383371 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:06.922442 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.229083 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.174748 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.385809 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.107754 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.082333 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.428050 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.265074 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.134741 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.196565 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.297987 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.324622 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.396224 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.334345 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.115735 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.036534 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.114618 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.385278 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.299315 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.226916 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.068508 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.358209 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.072013 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.107643 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.236083 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.364308 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.075026 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.195723 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:22:21 CEST)" was missed by 0:00:07.095975 iteration 6870/ 159576 | consumed samples: 248432 | elapsed time per iteration (ms): 30589.0 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.341326E+00 | loss scale: 1024.0 | grad norm: 33934.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.552252 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.753390 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.707384 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.600215 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.707854 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.470635 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.695018 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.589552 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.531803 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.502770 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.499233 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.246953 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.406784 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.439081 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.752558 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.622491 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.720743 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.710306 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.432289 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.361027 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.553630 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.459278 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.521102 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.649129 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.657550 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.658886 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.393037 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.396493 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.709769 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.560546 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.606868 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.440283 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.682745 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.623885 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.399542 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.432192 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.520255 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.688880 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.551512 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:23:21 CEST)" was missed by 0:00:07.420584 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:07.184847 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:06.983780 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:06.902154 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:07.126533 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:07.138888 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:07.031735 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:07.139372 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:06.678450 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:07.184034 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:06.985064 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:06.890742 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:06.963339 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:06.952545 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:06.934260 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:07.088976 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:06.930749 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:06.863757 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:06.838333 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:06.870604 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:07.021083 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:07.055300 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:07.038302 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:07.053997 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:07.080654 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:07.152249 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:07.090400 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:06.871776 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:07.141827 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:06.792546 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:06.828018 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:07.141275 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:06.992076 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:06.982935 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:06.824572 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:06.831039 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:07.114231 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:06.851976 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:06.951726 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:06.863704 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:24:21 CEST)" was missed by 0:00:07.120381 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.598963 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.517361 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.741737 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.800073 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.636237 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.578496 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.754111 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.754557 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.646963 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.756984 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.293644 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.453507 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.485768 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.600308 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.505970 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.567787 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.549487 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.669191 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.486937 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.545966 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.479004 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.799265 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.756478 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.695827 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.767461 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.705571 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.439748 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.729438 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.407742 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.443200 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.478859 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.607256 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.653579 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.704287 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.446228 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.670573 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.566920 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.735556 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.467252 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:25:21 CEST)" was missed by 0:00:07.598250 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.714031 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.632387 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.856758 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.915134 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.693565 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.869128 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.761970 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.869623 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.408705 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.914258 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.751304 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.621006 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.664539 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.820594 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.601970 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.661000 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.872065 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.568585 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.522767 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.600824 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.715359 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.682833 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.785559 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.768569 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.784237 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.810891 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.882493 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.819257 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.554756 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.844470 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.594050 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.593900 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.871538 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.722310 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.681971 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.558304 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.850582 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.713209 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.561297 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:26:21 CEST)" was missed by 0:00:07.582245 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.329104 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.471876 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.530261 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.247554 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.484295 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.484746 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.023822 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.183678 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.137860 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.529405 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.377138 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.459539 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.486628 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.279692 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.276152 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.497621 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.209182 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.330519 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.236149 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.297990 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.173365 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.399409 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.426030 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.434442 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.383761 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.400755 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.176423 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.297118 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.308862 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.366606 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.487344 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.170035 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.217271 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.197438 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.435911 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.216154 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.328438 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.209169 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.337612 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:27:21 CEST)" was missed by 0:00:08.465882 iteration 6880/ 159576 | consumed samples: 249392 | elapsed time per iteration (ms): 30100.7 | learning rate: 6.000E-05 | global batch size: 96 | lm loss: 6.354124E+00 | loss scale: 1024.0 | grad norm: 26852.610 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:07.250111 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:06.967422 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:07.249253 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:07.191750 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:07.028587 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:07.204157 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:07.097002 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:07.204640 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:06.743724 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:06.903581 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:06.857729 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:07.050357 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:07.086347 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:06.956009 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:07.017839 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:06.999553 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:07.119242 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:07.217491 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:07.154256 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:06.937000 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:06.996038 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:07.207087 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:06.929043 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:06.935839 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:07.120573 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:07.103586 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:07.145899 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:07.155632 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:06.889789 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:07.179468 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:06.928924 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:07.206544 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:07.057324 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:06.896294 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:07.016978 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:06.893297 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:07.185611 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:06.917256 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:07.049187 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:28:21 CEST)" was missed by 0:00:07.048263 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.676444 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.877544 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.831585 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.832018 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.594856 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.819212 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.655983 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.371106 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.530978 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.677777 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.713732 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.724449 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.834490 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.563263 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.876730 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.833923 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.583443 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.645267 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.626972 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.746667 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.844930 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.564444 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.623460 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.806870 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.556485 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.485230 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.520644 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.684716 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.773319 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.781740 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.783082 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.517216 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.556344 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.731068 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.523727 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.644410 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.748086 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.813057 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.544744 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:29:21 CEST)" was missed by 0:00:07.675724 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.191580 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.392650 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.334309 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.346705 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.109985 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.391803 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.171143 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.142100 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.239563 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.079536 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.347190 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.000310 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.192916 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.138582 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:07.886298 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.078424 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.228908 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.098573 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.160397 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.261809 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.349628 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.071617 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.046195 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.288459 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.360077 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.296860 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.298202 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.032346 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.322029 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.349105 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.246203 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.071461 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.199878 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.038860 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.159546 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.035871 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.328182 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.263225 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.059886 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:30:21 CEST)" was missed by 0:00:08.190875 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.676299 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.877407 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.819038 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.655862 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.831478 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.564261 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.530865 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.876564 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.724317 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.806723 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.831954 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.371024 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.677661 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.713624 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.645127 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.626871 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.746545 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.844802 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.781571 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.782921 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.517055 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.623332 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.834352 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.556341 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.485079 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.594816 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.563164 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.833834 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.583318 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.730894 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.773183 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.556193 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.684603 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.747910 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.523590 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.644275 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.520583 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.812900 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.675549 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:31:21 CEST)" was missed by 0:00:08.544582 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:08.200561 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:08.057890 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:08.258986 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:08.213029 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:07.976308 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:08.037443 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:07.945835 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:08.213496 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:07.752572 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:07.912464 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:08.258149 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:07.964870 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:08.026702 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:08.112433 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:08.008442 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:08.163116 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:08.105891 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:08.215950 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:07.937915 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:07.866648 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:07.944717 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:08.059226 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:08.095231 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:08.129439 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:08.128139 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:08.154776 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:08.226376 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:08.164492 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:07.898665 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:08.004924 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:08.188357 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:08.066177 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:07.902163 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:07.937792 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:08.215427 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:08.194464 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:08.057100 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:07.905167 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:08.025863 WARNING:apscheduler.executors.default:Run time of job "BaseEmissionsTracker._measure_power (trigger: interval[0:01:00], next run at: 2021-09-30 09:32:21 CEST)" was missed by 0:00:07.926125